NCSC-TG-022: Yellow book

NCSC-TG-022: Yellow book
NCSC-TG-022: Yellow book
GFI LANguard N.S.S. v8 - New version out now!
Voted Favorite Commercial Security Tool for two consecutive years by NMAP users.
Download free 30-day trial today! - www.gfi.com/lannetscan/
NCSC-TG-022: Yellow book
A Guide to Understanding Trusted Recovery in Trusted Syst
●
Published: Oct 16, 2002
●
Updated: Oct 16, 2002
●
Section: Network Security Library :: NCSC&DoD
Rainbow series
●
Author: The Editor
●
Company: WindowSecurity.com
Table of Contents
FOREWORD
ACKNOWLEDGMENTS
1 INTRODUCTION
1.1 BACKGROUND
1.2 PURPOSE
1.3 SCOPE
1.4 CONTROL OBJECTIVE
1.5 DOCUMENT OVERVIEW
2 FAILURES, DISCONTINUITIES, AND RECOVERY
2.1 STATE-TRANSITION (ACTION) FAILURES
2.2 TCB FAILURES
2.3 MEDIA FAILURES
2.4 DlSCONTINUITY OF OPERATION
3 PROPERTIES OF TRUSTED RECOVERY
3.1 SECURE STATES
3.2 SECURE STATE TRANSITIONS
4 DESIGN APPROACHES FOR TRUSTED RECOVERY
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (1 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
4.1 RESPONSIBILITY FOR TRUSTED RECOVERY
4.2 SOME PRACTICAL DIFFICULTIES WITH CURRENT FORMALISMS
4.3 SUMMARY OF CURRENT APPROACHES TO RECOVERY
4.3.1 Types of System Recovery
4.3.2 Current Approaches
4.3.3 Implementation of Atomic State Transitions
4.3.3.1 Shadowing
4.3.3.2 Logging
4.3.3.3 Logging and Shadowing
4.3.4 Recovery with Non-Atomic State Transitions
4.3.4.1 Sources of Inconsistency---A Generic Example
4.3.4.2 Non-Atomic TCB Primitives
4.3.4.3 Idempotency of Recovery Procedures
4.3.4.4 Recovery With Non-Atomic System Primitives
4.4 DESIGN OPTIONS FOR TRUSTED RECOVERY
5 IMPACT OF OTHER TCSEC REQUIREMENTS ON TRUSTED RECOVERY
5.1 OPERATIONAL ASSURANCE
5.2 LIFE-CYCLE ASSURANCE
5.2.1 Security Testing
5.2.2 Design Specification and Verification
5.2.3 Configuration Management
5.2.4 Trusted Distribution
5.3 DOCUMENTATION
5.3.1 Trusted Facility Manual
5.3.2 Test Documentation
5.3.3 Design Documentation
6 SATISFYING THE TCSEC REQUIREMENTS
6.1 REQUIREMENTS FOR SECURITY CLASS B3
6.1.1 Operational Assurance
6.1.1.1 System Architecture
6.1.1.2 Trusted Facility Management
6.1.2 Life-Cycle Assurance
6.1.2.1 Security Testing
6.1.2.2 Design Specification and Verification
6.1.2.3 Configuration Management
6.1.3 Documentation
6.1.3.1 Trusted Facility Manual
6.1.3.2 Test Documentation
6.1.3.3 Design documentation
6.2 ADDITIONAL REQUIREMENTS OF SECURITY CLASS A1
6.2.1 Additional Life-Cycle Assurance Requirements
6.2.1.1 Configuration Management
6.2.1.2 Trusted Distribution
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (2 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
GLOSSARY
BIBLIOGRAPHY
NCSC-TG-022
Library No. S-236,061 Version 1
FOREWORD
A Guide to Understanding Trusted Recovery in Trusted Systems provides a set of good practices related to
trusted recovery. We have written this guideline to help the vendor and evaluator community understand
the requirements for trusted recovery, as well as the level of detail required for trusted recovery at all
applicable classes, as described in the Department of Defense Trusted Computer Systems Evaluation
Criteria. In an effort to provide guidance, we make recommendations in this technical guideline that are
not requirements in the Criteria.
The Trusted Recovery Guide is the latest in a series of technical guidelines published by the National
Computer Security Center. These publications provide insight to the Trusted Computer Systems Evaluation
Criteria requirements for the computer security vendor and technical evaluator. The goal of the Technical
Guideline Program is to discuss each feature of the Criteria in detail and to provide the proper
interpretations with specific guidance.
The National Computer Security Center has established an aggressive program to study and implement
computer security technology. Our goal is to encourage the widespread availability of trusted computer
products for use by any organization desiring better protection of its important data. One way we do this
is by the Trusted Product Evaluation Program. This program focuses on the security features of
commercially produced and supported computer systems. We evaluate the protection capabilities against
the established criteria presented in the Trusted Computer System Evaluation Criteria. This program, and
an open and cooperative business relationship with the computer and telecommunications industries, will
result in the fulfillment of our country's information systems security requirements. We resolve to meet
the challenge of identifying trusted computer products suitable for use in processing information that
needs protection.
I invite your suggestions for revising this technical guideline. We will review this document as the need
arises.
30 December 1991
Patrick R. Gallagher, Jr.
Director
National Computer Security Center
ACKNOWLEDGMENTS
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (3 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
The National Computer Security Center extends special recognition and acknowledgment to Dr. Virgil D.
Gligor as the primary author of this document. James N. Menendez and Capt. James A. Muysenberg
(USAF) are recognized for the development of this guideline, and Capt. Muysenberg is recognized for its
editing and publication.
We wish to thank the many members of the computer security community who enthusiastically gave their
time and technical expertise in reviewing this guideline and providing valuable comments and
suggestions.
1 INTRODUCTION
1.1 BACKGROUND
The principal goal of the National Computer Security Center (NCSC) is to encourage the widespread
availability of trusted computer systems. In support of this goal the NCSC created a metric, the DoD
Trusted Computer System Evaluation Criteria (TCSEC) [17], against which computer systems could be
evaluated.
The TCSEC was originally published on 15 August 1983 as CSC-STD-001-83. In December 1985 the
Department of Defense adopted it, with a few changes, as a Department of Defense Standard, DoD
5200.28-STD. DoD Directive 5200.28, Security Requirements for Automatic Information Systems (AISs)
[10], requires the Department of Defense to use the TCSEC. The TCSEC is the standard used for
evaluating the effectiveness of security controls built into DoD AISs.
The TCSEC is divided into four divisions: D, C, B, and A. These divisions are ordered in a hierarchical
manner. The TCSEC reserves the highest division (A) for systems providing the best available level of
assurance. Within divisions C and B are subdivisions known as classes, which also are ordered in a
hierarchical manner to represent different levels of security in these divisions.
1.2 PURPOSE
An important assurance requirement of the TCSEC, which appears in classes B3 to A1, is trusted recovery.
The objective of trusted recovery is to ensure the maintenance of the security and accountability
properties of a system in the face of failures and discontinuities of operation. To accomplish this, a system
should incorporate a set of mechanisms enabling it to remain in a secure state whenever a well- defined
set of anticipated failures or discontinuities occur. It also should include a set of procedures enabling the
administrators to bring the system to a secure state whenever unanticipated failures or discontinuities
occur. (Chapter 6 explains the distinction between anticipated and unanticipated failures.)
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (4 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
Besides these mechanisms, the TCSEC's B3-A1 classes require the implementor to follow specific design
principles and practices, collectively called assurance measures. The TCSEC further requires the developer
to provide specific documentation evidence sufficient for an evaluator or accreditor to verify that the
mechanisms and assurances are sufficient to meet specified requirements.
This guide presents the issues involved in the design of trusted recovery. It provides guidance to
manufacturers on what functions of trusted recovery to incorporate into their systems. It also provides
guidance to system evaluators and accreditors on how to evaluate the design and implementation of
trusted recovery functions. This document contains suggestions and recommendations derived from
TCSEC objectives but which the TCSEC does not require. Examples in this document are not the only way
of accomplishing trusted recovery. Nor are the recommendations supplementary requirements to the
TCSEC. The only measure of TCSEC compliance is the TCSEC itself.
This guideline isn't a tutorial introduction to the topic of recovery. Instead, it's a summary of trusted
recovery issues that should be addressed by operating systems designed to satisfy the requirements of
the B3 and A1 classes. We assume the reader of this document is an operating system designer or
evaluator who is already familiar with the notion of recovery in operating systems. The guide explains the
security properties of system recovery (and the notion of trusted recovery). It also defines a set of
baseline requirements and recommendations for the design and evaluation of trusted recovery
mechanisms and assurance. The reader who is unfamiliar with the notion of system recovery and security
modeling required of B3 and Al systems may find it useful to refer both to the recovery literature (such as
[1, 5, 14-16, 20-23, 25, 27]) and the security literature (such as [3,11, 26, 29]) cited in this guide.
1.3 SCOPE
Trusted recovery refers to mechanisms and procedures necessary to ensure that failures and
discontinuities of operation don't compromise a system's secure operation. The guidelines for trusted
recovery presented refer to the design of these mechanisms and procedures required for the classes B3
and A1 of the TCSEC. These guidelines apply to computer systems and products built or modified with the
intention of satisfying TCSEC requirements. We make additional recommendations derived from the stated
objectives of the TCSEC.
Not addressed are recovery measures designed to tolerate failures caused by physical attacks on ADP
equipment, natural disasters, water or fire damage, nor administrative measures that deal with such
events. The evaluation of these measures is beyond the scope of the TCSEC [17, p. 89].
1.4 CONTROL OBJECTIVE
Trusted recovery is one of the areas of operational assurance. The assurance control objective states:
"Systems that are used to process or handle classified or other sensitive information must be designed to
guarantee correct and accurate interpretation of the security policy and must not distort the intent of that
policy. Assurance must be provided that correct implementation and operation of the policy exists
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (5 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
throughout the system's life-cycle." [17, p. 63]
This objective affects trusted recovery in two important ways. First, the design and implementation of the
recovery mechanisms and procedures must satisfy the life-cycle assurance requirements of correct
implementation and operation. Second, both a system's administrative procedures and recovery
mechanisms should ensure correct enforcement of the system security policy in the face of system
failures and discontinuities of operation. The notions of failure and discontinuity of operation are defined
in Chapter 2.
1.5 DOCUMENT OVERVIEW
This guide contains five chapters besides this introductory chapter. Chapter 2 reviews the key notions of
failure, discontinuity of operation, and recovery. Chapter 3 discusses the properties of trusted recovery.
Chapter 4 presents recovery design approaches and options that can be used for trusted recovery.
Chapter 5 discusses the impact of the other TCSEC requirements on trusted recovery. Chapter 6 presents
TCSEC requirements that affect the design and implementation of trusted recovery functions, and includes
additional recommendations corresponding to B3-A1 evaluation classes. The glossary contains the
definitions of the significant terms used. Following this is a list of the references cited in the text.
2 FAILURES, DISCONTINUITIES, AND RECOVERY
The TCSEC requires for security classes B3 and A1 that:
"Procedures and/or mechanisms shall be provided to assure that, after an ADP system failure or
other discontinuity, recovery without a protection compromise is obtained." [17, p. 39]
In this chapter we discuss the notions of failure and discontinuity of Trusted Computing Base (TCB)
operations, and present an informal qualitative description of their effects on system states. We also
briefly present general recovery approaches used in practice. Throughout this chapter and document we
use the term "failure" for an event causing a system function to behave inconsistently with its informal
specification. We reserve the term "discontinuity" of operation for failures caused by user, administrator,
or operator action.
Recovery mechanisms of computer systems are designed to respond to anticipated failures or
discontinuities of operation. These mechanisms do not handle "unanticipated" failures nor "unanticipated"
discontinuities of operation; therefore, computer-system documentation should include descriptions of
administrative procedures to handle such events. In a well-designed system, unanticipated failures and
discontinuities of operation are events expected to occur with very low frequency, i.e., once or twice per
year. For this reason, administrative procedures, as opposed to automated mechanisms in the system,
represent an adequate response to unanticipated failures and discontinuities of operation, even when
these procedures are complex and extensive.
One can't establish formal models of failure and discontinuity of operation in which proofs demonstrate
the model's internal consistency. Neither physical systems, such as devices, processing units, and storage,
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (6 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
nor behaviors of users, administrators, and operators, have formal properties [21]. Therefore, formal
modeling and specification of expected failures and discontinuities of operation can't be required. Only
informal assumptions derived from operational experience can be made about expected failures,
discontinuities, their effects, and their frequencies. References [14, 15, 21] present examples of such
assumptions. These informal assumptions, which should be stated explicitly in system documentation,
form the basis for the design of the recovery mechanisms and the definition of the administrative recovery
procedures.
However, recovery mechanisms and administrative procedures must reconstruct consistent system states,
or prevent state transitions to inconsistent states, as a direct response to occurrences of expected failures
or discontinuities of operation [8, 9]. A system state is "consistent" if the variables defining it satisfy given
predicates expressing formally or informally invariant properties of the system, discussed in Section 3.1. A
"state transition" is a function which changes the variables of a system state in a specified way, i.e.,
specified as constraints on the system's rules of operation -discussed in Section 3.2. Therefore, the design
of recovery mechanisms and administrative procedures should use invariant properties and statetransition constraints of the security model defined for the system, viz., discussion in Chapter 3.
The role of recovery mechanisms and of trusted recovery can be best understood by illustrating the effect
of failures and discontinuities of operation on typical systems. Informal and qualitative assumptions of
failures derived from operational experience with various systems have been presented in the literature
[14,15, 21]. Using these informal assumptions we can define general classes of failures that affect the
operation of a TCB.
One class of failures is identical to the class of errors caused when users pass wrong parameters to TCB
primitives, or invoke the wrong TCB primitives, and when system resources are exhausted or found in an
inconsistent state because of user actions. These are called state-transition failures or action failures. We
cover this type of user-induced failure, which falls more naturally in the area of exception processing, for
two reasons: (1) the failures of this class are, nevertheless, TCB domain failures regardless of their cause;
and (2) the processing of these failures---not just their specification and documentation---is relevant to
system security.
For example, incorrect error processing can bring the system into a state where a user cannot
communicate with the TCB, or can contribute to the mishandling of covert channels. However, we place
the major emphasis in this guideline on the more traditional notions of failure, namely TCB failures, media
failures, and administrator-induced discontinuity of operation.
2.1 STATE-TRANSITION (ACTION) FAILURES
State-transition failures, also called action failures, occur whenever a TCB primitive, which causes a state
transition, cannot complete its function because it detects exceptional conditions during its execution.
State-transition failures can be caused by bad parameters passed to TCB primitives, by exhaustion of
resource limits, by missing objects needed during TCB primitive execution, and so on.
The effects of state-transition failures on TCB states are not as far-reaching as those of other failures.
Because these failures occur often, the code of TCB primitives usually includes recovery mechanisms that
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (7 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
undo the temporary modifications of system states before the primitive's return, thus returning the
system to a consistent state. If the recovery mechanisms of TCB primitives fail to undo temporary
modifications of system states, the system may remain in an inconsistent state and eventually crash. A
crash is a failure that causes the processors' registers to be reset to some standard values [21]. Because
consistent system states cannot be recovered from processor and primary memory registers after a crash,
these registers are referred to as "volatile" storage. In contrast, consistent system states can usually be
recovered from magnetic media such as disks and tapes; these media are called "nonvolatile" storage.
Examples of recovery mechanisms included in TCB primitives to undo temporary state modifications after
state-transition failures are found in most contemporary operating systems. For instance, consider the
"creat" primitive of a hypothetical UNlX system which allocates i-node table entries before allocating file
table entries [l]. If the file table entry is full at the time "creat" call is made, a state-transition failure
would occur. Before returning to the caller, the recovery code of "creat" deallocates the i-node table entry
allocated for the file that couldn't be created. Failure to deallocate such entries would cause the i-node
table to fill up and remain full, causing a system crash.
2.2 TCB FAILURES
TCB failures occur whenever the TCB code detects an error below the TCB primitives' interface which
can't be fixed; i.e., the error cannot be masked. TCB failures are caused by persistent inconsistencies in
critical system tables, by wild branches of the TCB code (possibly caused by transient hardware failures),
by power failures, by processor failures, and so on. TCB failures always cause a system crash.
In systems providing a high degree of hardware fault tolerance, system crashes still occur because of
software errors. Since crashes cause volatile storage to be lost, and since nonvolatile media usually
survive crashes, recovery mechanisms can reconstruct consistent states in a maintenance mode of
operation. After reconstructing a consistent state, the recovery mechanisms restart the system with no
process execution in progress, e.g., processes that were active, blocked, or swapped out before the crash
are aborted. New processes, which run the code of aborted processes executing at the time of the crash,
can be started by users after the consistent state is reconstructed. Recovery mechanisms can reconstruct
consistent states by either removing or completing incomplete updates of various objects represented on
nonvolatile media. Properties of and design approaches for recovery mechanisms able to reconstruct
consistent states from nonvolatile storage after TCB failures are discussed in Section 3.2 and Chapter 4.
Some TCB failures allow a system to shut down in an orderly manner. These failures may be caused by
process swap-space exhaustion, timer-interrupt table exhaustion, and, in general, by conditions that can't
be handled by TCB primitives themselves in normal modes of operation. Traps originated by persistent
hardware failures, such as memory and bus parity errors, also may cause failures.
2.3 MEDIA FAILURES
Media failures occur whenever errors are detected on some nonvolatile storage device that the TCB
cannot fix (i.e., the errors can't be masked). Media failures are caused by hardware failures such as disk
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (8 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
head crashes, persistent read/write failures due to misaligned heads, worn-out magnetic coating, dust on
the disk surface, and so on. They also are caused by software failures such as TCB failures which make
media unreadable.
The effect of media failures is that part, or all, of the media representing TCB objects become inaccessible
and corrupt. Data structures relevant to system security also may be corrupted by media failures, e.g.,
object security labels. The system usually crashes unless the lost data can be retrieved from archival
storage and rebuilt on a redundant storage device. Of course, media failures that don't affect TCB objects
may not cause system crashes. If redundant media aren't available, or if users and administrators don't
keep archival data up-to-date, media failures may become unrecoverable failures. Administrative recovery
procedures may have to be used to bring the system to a consistent state. As discussed in Chapters 5 and
6, all these procedures should be explained in the system's Trusted Facility Manual.
2.4 DlSCONTINUITY OF OPERATION
Failures induced by users, administrators, and operators cause discontinuities of operation. Inside an
operating system, discontinuities of operation manifest themselves most often as state-transition failures,
TCB failures, and, less often, as media failures. They are caused by erroneous actions, such as
unexpected system shutdowns, e.g., by turning off the power. Also, they can be caused by lack of action,
such as ignoring the exhaustion of critical system resources under administrative control despite
documented or on-line warnings, e.g., audit trail is 95% full, insufficient swap space left, inadequate
configuration installed, etc.
The effects of discontinuities of operation are the same as those of the state-transition and TCB failures
mentioned above. Recovery mechanisms or administrative procedures necessary for the reconstruction of
a consistent state also are correspondingly similar to those used for failures. For example, cancellation of
a TCB primitive call by depressing the "break" key during the call's execution might have the same effect
as a state-transition failure detected by the TCB primitive. Each TCB primitive and state transition would
have to be designed either to ignore user cancellation signals during execution of critical code sections or
to clean up internal data structures during the processing of such signals.
Actions such as system shutdowns by power-off action during execution of TCB code may cause TCB
failures. Recovery mechanisms for TCB failures caused by power failures also may be able to handle
unexpected system shutdowns. In either case, during subsequent power-on procedures, the TCB not only
detects that TCB failures left the system in an inconsistent state, but also initiates recovery of a consistent
state before the system enters the normal mode of operation.
Somewhat less often, administrator or operator actions cause media failures. For example, initiation of online diagnostic tests of a media controller during normal mode of system operation, instead of the
maintenance mode, would most likely cause media failures. Similarly, initiation of TCB maintenance
actions such as disk reformatting in the normal mode of operation would certainly cause subsequent
media failures. Discontinuity of operation caused by administrator- or operator-induced failures may
require use of administrative recovery procedures.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (9 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
3 PROPERTIES OF TRUSTED RECOVERY
The properties of trusted recovery are defined in terms of two notions: secure states and secure state
transitions. A system state is secure whenever consistency invariants derived from valid interpretations of
security and accountability models are satisfied. A state transition is secure if both its input state and its
output state are secure, and it satisfies the constraints placed on it by valid interpretations of security
policy and accountability policy models.
Accountability models include models of user authentication, trusted path, and audit. The notions of
invariants for secure states and constraints for specific state transitions are briefly illustrated in this
chapter and discussed in detail in reference [11]. Reference [29] discusses the notion of a valid
interpretation of a security model in detail and reference [3] illustrates it. For the sake of brevity,
interpretations of security models aren't illustrated in this guideline.
3.1 SECURE STATES
State-machine (or "state-transition") models of security, such as the Bell La Padula model [3], define a
state in terms of the following abstract variables:
a. subjects (trusted and untrusted)
b. objects
c. access privileges
d. access matrix (defining the privileges of subjects to objects)
e. current access set (defining the privileges subjects can currently exercise over objects)
f. security function (defining the subject's maximum and current subject clearance and the object
classification)
g. object hierarchy
These abstract variables are used in defining state invariants that help define the notion of the secure
state. The following paragraphs labeled (1)-(5) discuss the use and characteristics of state invariants for
trusted recovery.
(1) Security in variants are derived formally from security model interpretations.
State-machine models also include conditions, or axioms, whose interpretations in a given system provide
invariant properties which must be satisfied by secure states. For example, the conditions of the Bell-La
Padula model include the following:
a. the simple-security condition for subjects
b. the *-property for the security function
c. the discretionary security condition for access privileges of current access sets
d. the compatibility condition for object hierarchies
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (10 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
The model description [3] precisely defines the specific meaning of these conditions in terms of the
variables of the system states and the examples of their valid interpretations.
(2) Informally derived in variants should augment formally derived invariants whenever necessary.
State-transition models might not include all the conditions that are relevant to the notion of security or
accountability. Whenever this is true, new invariants need to be defined to augment the set of existing
invariants derived from interpretation of model conditions (or axioms). For example, additional invariants
may be defined for objects such as the password file, user-account file, security map file, and system
configuration file, which are used by trusted processes of a system supporting the Bell-La Padula model.
These invariants may refer to multiple types of objects, as illustrated in this example:
In all system states, all user and group identifiers must be unique, integer values; the identifiers'
length may vary from zero to a defined maximum number of characters.
User and group identifiers may be included in a password file, a user account file, and a file defining
group membership. Additional invariants also may be needed for areas of security and accountability
policy where the interpretations of the model's conditions provide insufficient detail for a given system.
For example, an in- variant that is specific to TCBs implementing Multics-like access control lists (ACLs)
[26] for discretionary access control may state:
In all system states, the entries of an ACL must be sorted as follows:
<user id.group id > entries precede <user id.* > entries
<user id.* > entries precede < *.group id > entries
< *.group id > entries precede < *.* > entries
where "*" represents the wild-card qualifier.
Similar invariants also should augment other areas of access control and accountability (e.g., invariants
for user and system security profiles including minimum and maximum user clearances and object
classifications; invariants for audit mechanisms).
Recovery mechanisms of a TCB become trusted only if they maintain secure states in the normal mode of
operation, or detect insecure states and reconstruct secure states in the maintenance mode of operation,
despite the occurrence of failures and discontinuities. To detect insecure states after system failures or to
verify that recovered states are secure, recovery mechanisms must check whether security invariants are
satisfied. All security invariants are relevant to the recovery mechanisms handling state-transition failures.
These failures usually leave the system in the normal mode of operation, instead of causing the system to
enter maintenance mode. Therefore, all invariants which must hold in the normal mode of operation also
must hold after recovery from state-transition failures.
(3) Some security invariants may be irrelevant for trusted recovery.
Not all security invariants are relevant for other classes of failures. For example, consider the case of TCB
failures when all volatile memory is lost and the system enters maintenance mode. In maintenance mode,
consistent states are recovered, security invariants are checked, and the system is placed back in normal
mode of operation in a secure state that usually includes an empty set of user processes (i.e., untrusted
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (11 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
subjects) and a corresponding empty current access set.
In most system-recovery designs, recovery mechanisms restart only a few of the trusted processes (e.g.,
the "login" process listening to terminals). User processes in execution at the time of TCB failure are
aborted because their state is lost either along with the loss of volatile memory or during the recovery of
secure states. Users can start new processes executing the code of these aborted processes after the
system is placed in normal mode of operation. Therefore, after TCB failures, recovery mechanisms need
not check invariant properties referring to the current access set.
For example, from the set of the invariants derived from the conditions of the Bell-La Padula model, only
those corresponding to the simple-security condition and from the compatibility condition remain relevant
for trusted processes and for the object hierarchy, respectively, after system crashes. However, if the
current state of at least some user processes can be recovered after a TCB failure, other invariants
derived from the conditions of the Bell-La Padula model (e.g., the *-property), also become relevant.
Thus, the types of relevant invariants depend on the type of desired and possible system-state recovery
(e.g., a state with no active processes), no opened files or devices, or a state with at least some active
processes, opened files and devices. The type of system-state recovery is determined by either the
designers choice or by other, non-TCSEC requirements. However, a complete set of security invariants
should be derived from interpretations of security models for any type of system-state recovery.
(4) All TCB integrity invariants should be satisfied by trusted recovery.
To detect insecure states, or to verify that recovered states are secure, recovery mechanisms also must
verify whether integrity invariants of the TCB are satisfied. Internal TCB integrity invariants do not
necessarily refer only to state variables defined by user-level security or integrity models. They also may
refer to internal TCB variables used for object and subject representation, and may not be visible either at
the user nor at the administrative interface of the TCB. For example, internal TCB-integrity invariants may
require the satisfaction of the conditions below.
For all recovered states:
a. A disk sector is either free or allocated.
b. The sum of free and allocated disk sectors equals the total number of disk sectors.
c. Every allocated disk sector occurs exactly once in exactly one object representation,
and all free disk sectors do not belong to any object.
d. All active objects are reachable from the root of their hierarchy.
e. If an object's link count or reference count is zero, neither a link nor a reference exists
to that object.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (12 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
Most operating systems provide recovery mechanisms which attempt to detect internal inconsistencies
after crashes and to recover consistent states. [1, §5.18] provides examples of internal operating system
invariants for the UNIX file system which are enforced by the file system checking program "fsck."
(5) Lack of security invariants makes trusted recovery impossible.
The key role of secure-state determination in trusted recovery underlines the importance of security
invariants. Lack of such invariants makes it impossible to determine whether TCB primitives can tolerate
state-transition failures, i.e., whether TCB primitives clean up temporary state modifications before they
return to their caller. Similarly, lack of security invariants make it impossible for the recovery mechanisms
and procedures to determine whether the recovered state is secure even if all state transitions are
designed to satisfy all model's constraints in the face of "expected" failures and discontinuities of
operations, discussed in Section 3.2. "Unexpected" TCB failures, media failures, or discontinuities may still
leave the system in insecure states. Whether such states are secure or insecure can only be determined
by verifying the state invariants.
3.2 SECURE STATE TRANSITIONS
A secure state transition originating from a secure input state guarantees that (1) its output state is
secure, and that (2) all security constraints defined for it are satisfied. In contrast with an invariant, which
is defined on the variables of individual states and holds for all state transitions, a constraint is a predicate
that relates the variables of two or more states. A constraint may be relevant only to specific state
transitions. A typical constraint is the following:
For any TCB primitive which changes the security level of an object, the new level must dominate
the original security level of the object.
This constraint expresses a policy requirement that users can only upgrade, but not downgrade, the
classification of an object. Other, similar constraints can be found in the appendix of reference [3], and in
reference [11].
(1) Trusted recovery requires the satisfaction of all constraints.
TCB primitives implement secure state transitions. The primitives may temporarily place the system in
insecure states before they return to their caller. Any failure or discontinuity of operation during a state
transition may leave the system in an insecure state. Furthermore, even if the recovered system state
appears secure, it's possible that the constraints placed on the secure state transition (during which the
TCB failed) are violated as a result of the failure.
For example, consider a state transition which changes the security level of an object and satisfies the
constraint defined above in absence of failures. Because a security level may be a multi-field, multi-word
data structure, any update to it requires the execution of several instructions including at least one I/0
instruction. A failure may occur in the middle of the security level update; e.g., a power failure during the
disk-write operation may leave the security level only partially updated. This may violate the constraint
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (13 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
defined above but may not affect any state invariant. The update may have changed the clearance field
and some, but not all, of the category fields of the security level before the failure. As a consequence, the
resulting object security level would no longer dominate the original object security level, violating the
above constraint. However, it's possible the resulting object security level satisfies all state invariants,
such as those derived from the interpretation of the simple- security and compatibility conditions of the
Bell-La Padula model. Recovery mechanisms would have no way to detect whether the recovered state,
which would include the update, is insecure.
Similar, and more probable, problems may appear when the TCB crashes in the middle of writing a label
of a newly created file, in the middle of a security map update by a system-administrator process, or in
the middle of an update to a file containing an access control list or any other security-relevant data
structure. The notion of atomicity, discussed in the paragraph below labeled (2), would prevent this type
of recovery problem from violating system-security policy.
Recovery mechanisms can detect that the system is in an insecure state following a TCB or media failure,
after extensive, time-consuming checks of security in- variants, without relying on special designs of TCB
code implementing secure state transitions. However, seldom are the recovery mechanisms able to
recover an old secure state, or construct the new secure state from nonvolatile media, without relying on
special design and implementation of TCB code that anticipates various types of failures.
Because constraints relate variables of old states with those of the new states created by secure state
transitions, failures may make it impossible to verify constraints. In other words, the values of the old
states may no longer be available, whereas the values of the new states may not be fully updated at the
time of the failure. Thus, it's important to design the TCB code implementing state transitions so
assurances provide the recovery of either the secure input state or the secure output state of a transition.
In the latter case, the recovery mechanism should be guaranteed to satisfy the transition's security
constraints---not just the secure-state invariants.
(2) Atomicity of all TCB primitives ensures trusted recovery.
Atomicity is the key property of the TCB code implementing secure state transitions, enabling the recovery
mechanisms to reconstruct secure states despite the occurrence of (anticipated) failures. Assuming that
all state transitions can be serialized by concurrency control mechanisms [21], a state transition is
"atomic" in the face of failures if it's either carried out completely or has no effect at all. In practice, this
property implies that the TCB code implementing state transitions is designed to ensure that, for the class
of anticipated failures, the recovery mechanisms are always able to reconstruct either input secure states
or output secure states after failures.
The literature describes various approaches for providing atomicity at the operating system level [16, 21,
23, 25, 27], and at the data base management system level [14,15]. We summarize these approaches in
Chapter 4.
(3) Atomicity of all TCB primitives is not always necessary for trusted recovery.
In many operating systems or TCB primitives, it may be difficult to ensure that all secure-state transitions
are atomic. Some operating system primitives consist of many individual object updates, and atomicity
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (14 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
requires the implementation of these primitives as "transactions" (defined in Section 4.2.2 and in
references [14,15,16, 21,23,25,27]). The performance penalties and complexity of implementing
transactional behavior in all TCB primitives might be prohibitive for many small operating systems.
In many small operating systems, a more practical approach is taken to the design of TCB primitives and
object representations to aid recovery mechanisms in the detection of inconsistencies. It's acceptable to
implement TCB-primitive mechanisms which only help detect inconsistent (insecure) states after crashes,
but do not ensure atomicity of secure-state transitions. These implementations leave the task of
reconstructing consistent or secure states to administrative users and tools. The choice of which TCB
primitives should be made atomic (if any) or which TCB primitives should only help detect insecure states
after crashes (but not be atomic) belongs to the system designers.
An example of a file system design allowing the detection of inconsistent states by a "scavenging" process
is provided in [22]. The scavenging process recovers files whose representation is intact, and deletes files
whose representations become inconsistent after crashes. To a large extent, the UNIX system calls are
designed to ensure that the "fsck" program can detect a variety of file system inconsistencies after
crashes, with only some facilities for consistent state recovery without administrative intervention, viz., [1,
Chs. 5.16-5.18] and [5]. Section 4.2 presents examples of design considerations for TCB primitives which
help the recovery mechanisms detect and correct inconsistencies.
4 DESIGN APPROACHES FOR TRUSTED RECOVERY
In this chapter we discuss the responsibilities of administrative users in trusted recovery, outline some
practical difficulties in designing trusted recovery mechanisms, and summarize several nonexclusive
design approaches to recovery that may be used in trusted systems. We also review the main options for
trusted recovery available to operating system designers. The primary goal of this chapter is to give the
reader an overview of the recovery design issues presented in the literature and implemented in various
systems during the past decade.
4.1 RESPONSIBILITY FOR TRUSTED RECOVERY
The responsibilities for trusted recovery fall into two categories depending on the type of failure that
occurred: (1) TCB design responsibility, and (2) administrator responsibility. State-transition failures are
routinely handled by designs of TCB primitives without placing the system in maintenance mode.
Controlled system shutdown and subsequent rebooting also is performed by the TCB with little or no
administrative intervention, viz., the "fsck" program of UNIX.
In contrast, emergency system restarts and cold starts are performed primarily by system administrators
using TCB-supported tools. Among the system administrators, the System Programmer role is responsible
for trusted recovery functions (e.g., consistency checks on system objects, on individual TCB files and
databases, repair of damaged security labels, and so on [24]). The Security Administrator also may
perform recovery responsibilities whenever the System Programmer's role is not separately identified by
the system's design.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (15 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
The System Programmer role may have additional tools necessary for checking the integrity of TCB
structures and the security invariants of the initial state, i.e., for establishing the initial secure-system
state. The determination of this state is similar to the determination of whether a recovered state is
secure. However, the initial system state may not necessarily be identical to a recovered state because,
unlike a recovered state, the initial state may not contain any user-visible objects. Thus, fewer invariants
may be relevant for determining the security of the initial states.
Tools such as the ones suggested in reference [2], for identifying vulnerabilities of certain system states,
also may be useful both for establishing secure initial states and recovered states. Because some of the
tools used for detecting insecure states and for repairing various system data structures place the TCB in
maintenance mode (e.g., fsck tools [1, 5]), the use of these tools should be restricted to the System
Programmer role.
4.2 SOME PRACTICAL DIFFICULTIES WITH CURRENT FORMALISMS
The key role of secure state invariants and of state-transition constraints in trusted recovery suggests that
designs of B3-A1 systems should use state-machine models of security policy [11], instead of other types
of models, such as information flow models, which are not defined in terms of secure states and securestate transitions. Although in principle this suggestion appears to be sound, in practice the existing statemachine models of security policy, such as [3], are not fully adequate for designing trusted recovery
mechanisms. This is the case for at least the following two reasons.
First, when current state-transition models are interpreted, the state invariants and transition constraints
for trusted processes acting on behalf of system administrators are inadequate in number. Second, extant
models do not include accountability properties of secure systems, thus making the formal derivation of
accountability invariants and constraints impossible. The apparent reason for these inadequacies of extant
state-machine models is that they are too abstract to include system- specific invariants and constraints
for the areas mentioned above.
Invariants and constraints that need to be defined for trusted processes and for administrative programs
in the accountability area are significantly more numerous and complex than those derived from extant
state-machine models for other TCB areas. An attempt to define state invariants in the trusted process
area ---for formal security-verification purposes, not for trusted recovery---is reported in [4]. Appendix E
of reference [19] defines a fairly extensive list of invariant conditions for state variables in the
accountability area for the specific purpose of verifying the security of initial and recovered system states.
Lack of formal invariants and constraints for state variables and state transitions of trusted processes and
accountability programs makes it difficult to determine formally whether TCB primitives can tolerate even
state-transition failures, e.g., determine formally whether inconsistent states are cleaned-up properly
before primitives return to callers. To determine formally whether recovery mechanisms can reconstruct
current states, or complete state transitions in maintenance mode after expected TCB or media failures, is
a challenging task for anyone. However, informal derivation of system-specific invariants and constraints
is acceptable for design of trusted-recovery mechanisms so long as evidence is provided of their correct
and extensive use in design of these mechanisms. Such use represents significant added assurance these
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (16 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
mechanisms are designed correctly.
4.3 SUMMARY OF CURRENT APPROACHES TO RECOVERY
4.3.1 Types of System Recovery
Operating systems' responses to failures can be classified into three general categories: (1) system
reboot, (2) emergency system restart, and (3) system cold start [14].
System reboot is performed after shutting down the system in a controlled manner in response to a TCB
failure. For example, when the TCB detects the exhaustion of space in some of its critical tables, or finds
inconsistent object data structures, it closes all objects, aborts all active user processes, and restarts with
no user process in execution. Before restart, however, the recovery mechanisms make a best effort to
correct the source of inconsistency. Occasionally, the mere termination of all processes frees up some
important resources, allowing restart with enough resources available. Note that system rebooting is
useful when the recovery mechanisms can determine that TCB and user data structures affecting system
security and integrity are, in fact, in a consistent state.
Emergency system restart is done after a system fails in an uncontrolled manner in response to a TCB or
media failure. In such cases, TCB and user objects on nonvolatile storage belonging to processes active at
the time of TCB or media failure may be left in an inconsistent state. The system enters maintenance
mode, recovery is performed automatically, and the system restarts with no user processes in progress
after bringing up the system in a consistent state.
System cold start takes place when unexpected TCB or media failures take place and the recovery
procedures cannot bring the system to a consistent state. TCB and user objects may remain in an
inconsistent state following attempts to recover automatically. Intervention of administrative personnel is
now required to bring the system to a consistent state from maintenance mode.
4.3.2 Current Approaches
A possible view of automated recovery mechanisms for TCBs would be to consider all internal TCB data
structures and seCurity attributes as a database. This view is useful because significant technological
advances in database consistency and recovery have been made in the last decade; in principle, one
could use these advances for the design of trusted recovery for TCBs. For this reason, we review possible
approaches used in data management and other storage systems for implementing atomic actions and
transactions.
An alternative approach to recovery, used mostly at the operating system level, relies on detection of
inconsistencies caused by failures during non-atomic state transitions and subsequent correction of those
inconsistencies. This approach, generally called the "optimistic" approach to recovery (and to the
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (17 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
serializability of actions), assumes that TCB failures are rare, and therefore the overhead penalties of its
extensive recovery mechanisms don't affect the overall performance significantly. In general, the
optimistic approach to recovery has better overall performance than approaches which make all state
transitions atomic, but recovery is significantly more difficult to design. Section 4.3.4 provides examples of
operating system mechanisms that help detect TCB inconsistencies after failures.
4.3.3 Implementation of Atomic State Transitions
An action is atomic if it's unitary and serializable; it's unitary if it either happens or has no effect; it's
serializable if it's part of a collection of actions where any two or more actions relating to the same object
appear to execute in seemingly serial order [14,15,21,25,27]. A transaction consists of a sequence of read
and write actions, and is atomic if its entire sequence of actions is unitary and serializable. Examples of
TCB primitives which should be implemented as atomic actions or transactions include updating a
password file, updating a security map, changing a security level, changing a user security profile,
updating an ACL, and linking or unlinking an object to a hierarchy.
Three basic techniques are used to implement atomic actions and transactions:
(1) update shadowing, (2) update logging, and (3) combinations of update shadowing and logging. A
major difference between shadowing and logging is that systems using shadowing accomplish all updates
to redundant disk pages or files until all updates of an atomic transaction are finished, and then introduce
these updates into the original system data; whereas systems using only logging write each update on a
log first and then update the original system data directly, i.e., the updates are done in place.
The advantages and disadvantages of both alternatives, the storage and I/O requirements for nonvolatile
memory, and the notions of two-phase commits in transaction implementations are discussed in [14,15].
4.3.3.1 Shadowing
The central idea behind making a TCB primitive behave as an atomic transaction is that of updating the
primitive in two phases. First, all the updates of the primitive are collected in a set of "intentions," without
changing the system's data, i.e., the updates are not performed in place. The last action of the primitive is
to commit its updates by adding a special "commit" record to the set of intentions.
Second, the updates of the intentions set replace the original data in the system, making the updates
visible to other system actions or transactions.
If a crash occurs after the commit record is written, but before all intentions replace the original data on
nonvolatile storage, the second phase is restarted by the recovery mechanisms as often as necessary
(because there may be subsequent crashes during recovery), without leaving any undesirable side effects.
Thus, the recovery mechanisms can complete a state transition interrupted by the crash by carrying out
all update intentions. Then the set of intentions is erased.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (18 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
The recovery mechanisms are able to construct the new secure (output) state which would have been
produced by the state transition had failure not interrupted it. If a crash occurs before writing the commit
record, the recovery mechanisms can only find, or reconstruct, an incomplete set of intentions on
nonvolatile storage and erase them. Thus, the recovery mechanisms retain the original secure (input)
state prevailing before the crash interrupted the state transition.
To ensure that the above scheme works correctly for an expected set of TCB and media failures, the
nonvolatile media must be designed in such a way that (1) the recovery mechanisms can find or
reconstruct complete sets of intentions after a crash or determine the intentions list is incomplete (e.g.,
no commit record is found for it); and (2) the action which writes the commit record, and therefore
completes the set of intentions, is itself atomic.
To ensure property (1) above, redundant storage is used to represent sets of intentions, e.g., a pair of
related disk pages are written sequentially with the same intention data. The placement of each page of a
pair on nonvolatile media, which defines the pair's relationship, is chosen so at least one of the two pages
survives all "expected" failures. For example, if the redundant storage containing intentions lists is
expected to survive disk-head crashes, the disk should be duplexed and page "a" on the first disk also is
written as page "a" on the second disk. However, if the redundant storage is only expected to survive diskread failures, page "a" also is written as page "f(a)" on the same disk, where the function "f" defines the
relationship between the addresses of the two pages. The only inconsistency-detection and reconstruction
tasks to be performed relate to the recovery of complete intentions sets after each crash. Note that the
inconsistency-detection and reconstruction tasks are placed at a low level within the TCB and, therefore,
need not concern themselves with the maintenance of type properties of various objects the TCB may
have updated at the time of the crash.
To ensure (2) above, the last disk-write operation to the intentions set, which commits the transaction,
should be implemented with a single hardware-provided I/0 instruction that's either atomic or detectably
non-atomic with respect to the set of expected failures. Most commercially available I/0 hardware satisfies
this requirement.
This scheme, called shadowing, was conceived by Lampson and Sturgis in 1976, was reported in detail in
[21], and was implemented in several distributed storage systems at Xerox PARC [20,23,25,27].
4.3.3.2 Logging
Mechanisms which assure that a TCB primitive exhibits transaction atomicity and which are based on
logging assume that nonvolatile storage exists which can be made reliable enough to survive all expected
failures, e.g., by storage duplexing. With logging, each update of a TCB primitive, including both the
original object values and the update values, is written onto a log represented on nonvolatile storage
before the object being updated is actually modified in storage, i.e., updated in place. The last action of
the primitive is to write a commit record on the log, signifying the end of the primitive's invocation.
If the system crashes before the commit point, the entire TCB primitive, i.e., transaction, is aborted.
Because the log is written before any individual update is made in place, all updates can be undone
during recovery merely by reading the log records of the primitive and restoring the original values saved
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (19 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
in the log entries. Complex consistency checks are unnecessary. Thus, the recovery mechanisms can
retain the original (input) state prevailing before the state transition was interrupted by the crash. This
protocol is called "the write-ahead log" protocol in [14,15] and implemented in [18]. However, if the
system crashes after writing the commit record onto the log, then all updates of that TCB primitive can be
redone from the log records. Thus, the only reconstruction activity remaining is that of restoring object
contents from the log records.
Logging mechanisms similar to the one just outlined have been used in database management systems
for a long time. Complete description of similar logging mechanisms, which in practice also may include
state check pointing and transaction save pointing features, are found in [14, 15], and in systems
reference manuals of many commercially available database management systems [18].
4.3.3.3 Logging and Shadowing
Both logging and shadowing have relative advantages and disadvantages. These are discussed in [15].
Gray et al. point out that the performance characteristics of shadowing make it a desirable mechanism for
small databases and systems, whereas logging is desirable in large database systems. In practice many
systems combine both mechanisms to retain the advantages of each [15,16].
In general, operating systems and TCBs maintain relatively small databases for implementing security
policies. Many of the extant operating systems, such as UNIX, already use simple versions of file
shadowing to do some atomic actions and transactions. Few operating systems implement logging at the
TCB level for recovery reasons. Logging is used mostly at the application level, outside the TCB.
Both informal [21] and formal [16] proofs of correctness for recovery mechanisms using shadowing, or
shadowing and logging, are in the literature. All these proofs make only informal assumptions of failure
behavior.
4.3.4 Recovery with Non-Atomic State Transitions
Many of the extant operating systems take an "optimistic" approach to recovery. Designs of such systems
assume failures requiring system restart are rare. Therefore, the performance penalties in normal mode of
operation and design complexity caused by TCB primitives with atomic behavior may be unwarranted.
Whether such penalties are significant is still a matter of some debate. For some performance figures the
reader should consult references [14] and [23].
Subtle system-specific properties are encountered in designs of TCB primitives that don't support atomic
state transitions but nevertheless help ensure that the recovery mechanisms can reconstruct a consistent
system state. In this section, we present an example of a generic source of inconsistency caused by TCB
crashes and illustrate specific properties of non-atomic TCB primitives enabling recovery of consistent,
secure states. Of course, these properties aren't necessarily sufficient for trusted recovery. Although other
similar properties may be found for trusted recovery, demonstrating their sufficiency for trusted recovery
with non-atomic TCB primitives remains a challenge of individual system design.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (20 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
4.3.4.1 Sources of Inconsistency---A Generic Example
A typical inconsistency caused by TCB crashes appears in operating systems maintaining several related
data structures within the TCB to enable the sharing, protection, and management of objects. For
example, user-level references to an object, which are stored in directories, typically point to a single
"object map" (e.g., an entry in the "object map table," in the "master object directory," in the "global
object table," in a "volume table of contents," or an i-node in the "i-node space"). This map contains
various fields defining object attributes such as length, status (active/inactive, locked/unlocked, access
and modification time, etc.), access privileges or ACL references, and a list of memory blocks allocated to
the object referenced by the map. The map identifies the object's representation on secondary storage. A
copy of all the object maps representing active objects is kept in primary memory. These copies identify
the representation of objects in a cache of active objects. The memory blocks allocated to the object
representation may themselves contain user- level references to other objects, e.g., whenever the object
is a directory.
Any TCB primitive which deletes such an object has to invalidate and/or deallocate at least two of the
three related data structures, viz., the object representation and the object map. In most systems, the
outstanding user-level references to the deallocated object, e.g., user-directory entries for the deallocated
object, also have to be invalidated or deleted. Capability-based systems are an exception to this because
capabilities are user-level object references which can't be reused owing to their use of system-wide
unique identifiers [12].
The invalidation or deletion of the three types of related structures (user-level references ~ object map ~
object representation) during a TCB-primitive invocation should be performed in a single atomic action or
transaction. Similarly, any TCB-primitive invocation which creates an object and a reference to it would
have to allocate all three related structures in a single atomic action or transaction. Should the TCB
primitives allocate/deallocate the three related structures independently using non-atomic actions, crashes
between the allocation/deallocation of one of the related structures and the rest might leave references
pointing to nonexistent objects, causing a "dangling-reference" problem. Alternatively, such crashes might
leave references to already allocated objects of other users, causing an "object-reuse" problem.
4.3.4.2 Non-Atomic TCB Primitives
Some designs of TCB primitives enable recovery mechanisms to reconstruct consistent states without
requiring atomicity of (all) TCB primitives and (all) object updates. We present two properties of TCB
designs below illustrating this fact.
Example 1---Ordered Disk-Writes within TCB Primitives
TCB primitives which allocate,1deallocate TCB structures, such as the three related structures mentioned
previously, could order their write (update) operations to nonvolatile storage in such a way as to prevent
both dangling references and object reuse problems after crashes. Consider the following order of
updates and requirement for synchronous writes:
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (21 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
(1) In any TCB primitive which deallocates user-level references, object map, and object
representations within one invocation, the necessary diskwrite operations should follow the
direction of references; i.e., the objects containing the user-level reference should be written to
the disk first, the objeCt map next, and the object representation last.
In any TCB primitive which allocates user-level references, object map, and object representation
within one invocation, the necessary disk operations should follow the direction opposite to that of
references; i.e., the object representation should be written to the disk first, the object map next,
and the object containing the user-level reference last.
(2) All disk-write operations of a TCB primitive which updates user-level references, object map,
and object representation within one invocation, should be synchronous; i.e., the caller should not
be allowed to proceed until the disk-write operation completes.
The usefulness of ordering disk-writes in all relevant TCB primitives is apparent when one considers the
effects of system crashes and subsequent TCB rebooting. Suppose the order of disk-write operations
mentioned above does not hold for a TCB primitive which deallocates an object along with the last
reference to it. In this scenario, the object map and representation could be deallocated first, and their
corresponding tables could be updated by two disk-write operations before the object containing the userlevel reference would be updated.
A crash occurring after the completion of the first two disk-write operations leaves the user-level
reference dangling. This requires the recovery procedures to search the entire directory system to find the
dangling reference (if any) before the TCB is rebooted and before the deallocated object map and
representation space are reallocated. Should this not happen, or should recovery procedures fail to
discover dangling references after crashes, the system might enter an insecure state causing object-reuse
problems. A similar problem appears when the crash occurs after the deallocation of the object
representation but before the deletion of the object map and the user-level reference.
In contrast, if the disk-write operations required by object deallocation and reference deletion are ordered
as suggested in requirement (1) of Example 1, a crash between two consecutive disk-write operations
could only cause objects or their maps to become inaccessible. This might cause a denial-of-service
problem but not an object-reuse problem.
For example, a crash could occur after the last user-level reference was deleted, and after the
corresponding disk-write was completed, but before the disk writes required by the object map and
representation deletions could be completed. Both the object representation and map become
inaccessible to any user-level programs.
Object inaccessibility also could be caused by crashes occurring after both user-level references and
object map are deleted. However, to handle object inaccessibility, recovery procedures need only do
"garbage collection" operations. Should these operations (inherently simpler than those finding all
dangling references) not be done or should they fail to find all inaccessible objects after a crash, denial-ofservice (but not security policy violations) may occur. This example shows that the proper ordering of diskwrite operations in all relevant TCB primitives helps reduce the security vulnerability of a system after
crashes.
The requirement (2) above for synchronous disk-write operations is necessary to preserve the ordering of
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (22 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
disk-writes suggested in requirement (1). Asynchronous disk-write operations don't necessarily preserve
such ordering.
Similar disk-write ordering rules apply to multi-leaf trees of references. Such a tree appears in sets of
related data structures (user-level references ~ object map ~ object representation) when a reference to
a separate ACL object is placed in, or is implicitly associated with, the object map (i.e., object map ~
object representation and object map ~ ACL). In such a case, both the object representation and the ACL
should be created before the object map, and the object map should be deleted before both the object
representation and the ACL. Failure to order the disk- write operations in the proper sequence may cause
the reuse of the ACL with foreign objects after crashes. This would represent a serious access control
problem if left uncorrected.
Example 2---Redundancy of TCB Storage Structures
The idea of maintaining redundant storage for critical data structures, enabling recovery mechanisms to
reconstruct consistent system states, is neither new nor novel. Similar ideas have led to the use of
"double-entry bookkeeping" in accounting systems to accomplish similar goals. Furthermore, it could be
argued the design approaches for atomic actions or transactions of TCB primitives ---as discussed in the
previous section ---also use redundant storage structures. We suggest here that redundant storage
structures can be used solely to aid recovery mechanisms to detect, and possibly correct, inconsistent TCB
structures, and not necessarily to provide failure atomicity for all TCB primitives.
Although minimized TCBs, as those required for security classes B3 and A1 are likely to present at their
interface only objects with a very simple structure, e.g., segments as opposed to files, the representation
of these objects on nonvolatile storage usually requires the maintenance of other more primitive
structures. The discovery of inconsistent or even inaccessible objects after a crash, but not necessarily the
provision of atomic object updates, can be assured by maintenance of redundant, low-level TCB
structures.
For example, consider a system in which a segment representation on the disk consists of a set of not
necessarily contiguous pages (i.e., disk sectors) identified by an index page. To enable recovery
mechanisms to detect inconsistent segment structures, redundant structures can be maintained
independently of index pages that define the unique association between object identifiers and pages. All
TCB primitives would be designed to reflect every update of an index page in the redundant structure
also, and vice versa; however, updates would be reflected only after disk writes to originals had
completed successfully (i.e., disk writes to index pages are synchronous). Recovery mechanisms would
detect, and possibly correct, inconsistencies of segment representation by comparing the contents of
index pages with those of the corresponding redundant structures. The additional ability to detect which
of the two structures is affected by a failure, and the ability to ensure that one of the two structures
survives all expected failures, enables recovery mechanisms to correct either one of the two structures, as
necessary.
Redundancy of segment representation structures can't guarantee that the segment contents would be
internally consistent, nor that they would be consistent with the contents of other segments. Failureatomicity of TCB primitives wouldn't be guaranteed in any way by the maintenance of redundant
structures suggested in Example 2, although similar structures could be used as the basis for additional
mechanisms which would implement failure atomicity [23].
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (23 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
4.3.4.3 Idempotency of Recovery Procedures
Recovery procedures should be restartable after repeated crashes, and every time they're restarted, the
TCB should be left in no worse state than before. This property is called idempotency, and should exist
regardless of the type of diskwrites or redundancy used within the TCB. Repeated crashes of recovery
procedures would have no undesirable side effects in the system's TCB. Although recovery procedures of
systems supporting atomic TCB primitives and transactions also must be idempotent, the demonstration
of idempotency of these procedures seems significantly simpler than in systems in which failure-atomicity
of TCB primitives isn't supported. The idempotency property of recovery mechanisms using "intention
sets" ---discussed in Section 4.3.3.1---is presented in [21].
4.3.4.4 Recovery With Non-Atomic System Primitives
Example 1 --- The UNIX File System
The ordering of synchronous disk-write operations is enforced by UNIX kernels [1]. For example,
whenever the "unlink" primitive deletes the last link to an object and deallocates the object, and
whenever the "creat" primitive allocates an object and places a reference to it in a directory, the diskwrite operations are ordered as suggested in (1) of Section 4.3.4.2 and are performed synchronously.
User-level references are represented in UNIX by i-node numbers in directory entries, object maps by nodes, and object representations by disk blocks. Reference [1, ch. 5] describes the negative
consequences of not ordering the disk-write operations, and of not doing them synchronously.
The UNIX program "fsck" is also a good example of a typical recovery program. Fsck detects dangling inodes and i-node numbers, duplicate i-nodes associated with the same disk block, unbalanced i-node
references and object links, as well as lost and corrupted i-nodes and disk blocks. However, for dangling inodes and -node numbers, duplicate i-nodes associated with the same disk block, and unbalanced i-node
references and object links, "fsck" requires the intervention of administrative users to resolve
inconsistencies. As pointed out in [5], seldom is enough information available enabling administrators to
make correct decisions and resolve inconsistencies. Suggested changes proposed in [5] would help
administrators make sound low-level recovery decisions. Thus, not only should non-atomic system
primitives be designed to support recovery but recovery mechanisms should enable administrators to
make correct recovery decisions.
Example 2---The Cambridge File Server
The Cambridge File Server (CFS) actually implements failure-atomic updates of individual files [23].
Although a good example of disk redundancy, the mechanism provided by CFS can be used to guarantee
only that recovery mechanisms can reconstruct consistent file structures, not necessarily file Contents.
In the CFS, a file's representation on the disk consists of one or more index pages pointing to data pages.
A file can be viewed as a single-root tree whose nodes are index pages and whose leaves are data pages.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (24 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
CFS maintains a redundant structure, called the cylinder map, to represent the relationship between file
identifiers, file pages, and page status for all objects of each disk cylinder. Each cylinder map entry
contains (1) the unique identifier of the file to which the defined page belongs, (2) the tree address
occupied by the page in the file representation, and (3) the allocation and use state of the page. Thus,
each disk page is defined by its entry in the cylinder map and by its position in the tree of pages
representing the structure.
The CFS primitives update the cylinder maps only after a file's tree is updated consistently. Thus, the CFS
recovery procedures can rebuild the file system structure using redundant information after a crashproducing failure. For example, whenever a crash corrupts a file's index page, the recovery procedures
search the cylinder maps to find all pages belonging to that file. At the end of the search, all pages
referenced by the corrupted index page and their tree positions are found, and the corrupted index page
can be reconstructed. Conversely, whenever a cylinder map is corrupted by a crash, the recovery
procedures examine each file structure starting at the root of the file system to find all disk pages
belonging to the corrupted cylinder map. At the end of the search, which may include the entire file
system and may take as long as half an hour with current disk technology, the corrupted cylinder map is
reconstructed.
The cylinder maps of the CFS are used for additional functions, including that of keeping status
information for pages of intention sets. Although the use of the redundancy function of cylinder maps in
providing recoverable storage for implementing atomic actions has been less successful than that used in
shadow paging [23], the use of this function for automated recovery of file system structures is quite
adequate.
Example 3---Selective Atomicity
Another approach to recovery in a secure state without using atomic TCB primitives is "selective
atomicity." Selective atomicity means that only a subset of the TCB actions are atomic when operating on
a specific subset of TCB objects. The intent is to preserve the consistency of the representation of certain
objects (e.g., those implemented in nonvolatile storage) and object hierarchies (e.g., file systems) but not
necessarily of object contents. Whenever it's security-relevant, the consistency of object contents is left in
the care of the particular application. Special atomicity mechanisms maintain the consistency of the
content of security-relevant objects -a small subset of all system's objects.
The advantages of using the selective-atomicity approach are that the performance penalties of full
atomicity are avoided and system recovery in a secure state is simplified. For example, recovery programs
could restore the structure of a file system (including directories, disk blocks, and indices to physical
records), minimizing administrative intervention -a feature not found in most recovery programs such as
the "fsck."
Selective atomicity exists in the AIX system version 3.1. In this system, a fairly robust file system is
obtained by implementing atomic updates for segments that contain directories, i-nodes, and indirect file
blocks. Also, any change to the disk- block allocation map is atomic [6]. The implementation of the
atomicity features is based on logging using a concept called "database memory," because of its
resemblance to the logging features of database management systems [7]. Note that the selective
atomicity implemented in AlX 3.1 does not refer to non-kernel objects (e.g., objects implemented by
system processes). Thus, a TCB using the AlX 3.1 kernel would have to carry out atomic or recoverable
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (25 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
actions within trusted processes whenever such actions are required.
4.4 DESIGN OPTIONS FOR TRUSTED RECOVERY
The design and implementation of trusted recovery mechanisms and procedures should include the
following nonexclusive options:
(1) For the set of all state-transition failures, TCB recovery code should ensure the restoration of
the secure input state; i.e., remove all temporary modifications of the secure state which violate
security invariants.
(2) For the set of expected TCB or media failures (appropriately chosen by the system designers),
TCB code that causes state transitions should be designed, whenever possible, to make securestate transitions atomic; i.e., the recovery mechanisms should reconstruct either the secure input
states or the secure output states of those transitions.
(3) For the subset of state transitions that aren't atomic in the face of expected TCB or media
failures, the recovery mechanisms should detect that these state transitions have left the TCB in
insecure states. TCB-supported administrative tools should enable the reconstruction of a
predictable secure state, with or without administrative-user intervention. This state may differ
from both the secure input and output states of the transition during which the failure occurred.
(4) For the complementary set of "unexpected" or rare TCB and media failures, administrative
procedures and tools (used to restore a predictable secure state) should be defined and
documented. This secure state also may differ from both the secure input and output states of the
transition during which the failure occurred.
Options (3) and (4) above suggest that expected TCB or media failures caused by either spontaneous
events or user-, administrator-, or operator-induced discontinuities may create secure states which,
nevertheless, violate integrity and availability requirements of user applications. Clearly, if the recovered
state differs from either the secure input or secure output states of the transition during which the failure
occurred, then both "lost" updates and "dirty" reads are possible [13,14]. However, options (3) and (4)
are still acceptable for systems evaluated under the TCSEC because the TCSEC does not include
application integrity and availability requirements.
If user applications require higher degrees of integrity and availability than those supported by the TCB,
they could always implement additional, separate recovery mechanisms. In fact, this approach is taken by
most current applications including database management systems [14,15]. This approach is also sound
from a system architecture point of view because it separates application recovery mechanisms from
those of the TCB. Support of application level recovery within the TCB is both unnecessary and
unwarranted. It's unnecessary because application portability rules out reliance on the recovery features
of a specific TCB, and thus applications tend to implement their own recovery features. It's unwarranted
because it violates the class B3-A1 requirement for TCB minimality and increases the assurance burden.
Trusted recovery requires that the state-transition models used for secure system design include both
state invariants and transition constraints. A model that includes only state invariants is inadequate
because trusted recovery requires that state-transition constraints be satisfied (viz., discussion in Section
3.2). Similarly, a model that includes only transition constraints is inadequate because the occurrence of
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (26 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
unanticipated failures and discontinuities of operations may prevent the system from completing state
transitions and place it in insecure states. These states can only be determined to be insecure after
checking that secure-state invariants are not satisfied (viz., discussion in Section 3.1).
The reader should note that state-transition models of security policy are particularly suitable for the
design of trusted recovery mechanisms. Because these models include the notion of secure states and
secure state transitions, they can be integrated with recovery models, all defined in terms of states and
state transitions (which are possibly different). Unlike the state-transition models, information flow and
noninterference models don't include explicitly the notions of state and state transitions, and thus are
more difficult, if not impossible, to use for defining formally the notion of trusted recovery. Furthermore,
information flow and noninterference models cover nondiscretionary access controls and thus lack
discretionary access control and other policy components. This makes the use of such models impractical
for the formal definition of trusted recovery.
5 IMPACT OF OTHER TCSEC REQUIREMENTS ON TRUSTED
RECOVERY
Security policy and accountability requirements of the TCSEC are only indirectly relevant to trusted
recovery. That is, specific requirements of these areas, which may be relevant to trusted recovery, have
already been levied on trusted facility management functions and interfaces [24]. In this chapter, we
focus only on the TCSEC areas specific to trusted recovery; we discuss the relevance or irrelevance of
specific requirements.
5.1 OPERATIONAL ASSURANCE
Most of the assurance requirements of the TCSEC apply to trusted recovery because trusted-recovery
code is part of a system's TCB. Some TCSEC assurance requirements become irrelevant because
interfaces to trusted recovery functions are either invisible to users or, whenever they are visible, can be
used only by administrative personnel authorized by trusted facility management. The user visibility of
trusted recovery interfaces, or lack thereof, is established under the assurance requirements of trusted
facility management [24], and therefore we don't repeat it here.
In the operational assurance area, only the trusted facility management and the system architecture areas
have specific requirements relevant to trusted recovery. Because system integrity requirements refer to
the diagnostic testing of the hardware and firmware elements of the TCB, they have no special relevance
here beyond that of addressing hardware/firmware elements that may include recovery mechanisms.
Covert channel analysis of TCB interfaces offered by trusted recovery isn't necessary.
Administrative users are the only users who may use trusted recovery mechanisms. They have multi-level
access to system and user data and are trusted to maintain the data secrecy and not exploit covert
channels while operating in administrative roles. Thus, administrative users must be cleared to the highest
level of data classification present on the system. Furthermore, all code implementing trusted recovery
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (27 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
functions should be scrutinized to ensure, to the largest extent possible, these functions don't contain any
Trojan Horses or Trap Doors.
Most system-architecture requirements of the TCB apply to trusted-recovery code. For example, TCB
programs and data structures implementing trusted recovery must comply with the following
requirements:
a. Satisfy modularity requirements.
b. Make significant use of abstraction and information hiding.
c. Use layering of recovery functions.
d. Satisfy the requirements of the least privilege principle to the largest possible extent.
Since trusted recovery is used mostly in maintenance mode when all storage, segmented or not, must be
available to recovery code, most protection mechanisms are disabled. Thus, application of the least
privilege principle and insistence on use of logically distinct objects with separate attributes is less obvious
here than when mechanisms are used in the normal mode of system operation.
The only trusted facility management requirement affecting trusted recovery is that of segregating the
security-relevant from the security-irrelevant administrative functions. Because trusted recovery functions
are obviously security relevant, they must be allocated either to the System Administrator or to the
System Programmer roles [24].
5.2 LIFE-CYCLE ASSURANCE
In contrast with the operational assurance, all areas of life-cycle assurance are relevant to trusted
recovery. These areas are security testing, design specification and verification, configuration
management, and trusted distribution.
5.2.1 Security Testing
The purpose of testing trusted recovery mechanisms is to uncover design and implementation flaws
allowing failure recovery to place the TCB in insecure states. The major issue in this area is delimiting the
scope of security testing, i.e., reconciling the general objectives and practices of security testing with the
limited coverage of failures and discontinuities of operation which is possible in practice.
The objectives of security testing suggest that security testing should be performed using test fixtures
external to the TCB, should not require TCB instrumentation, should be repeatable, and should include
precise coverage analysis. For testing trusted recovery functions, only the requirements of class B3 are
relevant because, as discussed below, formal top-level specifications aren't necessary for trusted recovery,
viz., [24].
However, only state-transition failures and discontinuities of operation (but not all TCB and media failures)
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (28 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
can be generated from outside the TCB in a repeatable manner and without any TCB instrumentation.
Whenever TCB and media failures cannot be generated from outside the TCB in a repeatable manner and
without any internal TCB instrumentation, design and implementation analysis and review are necessary
to determine whether the recovery mechanisms can handle the untested failure responses. Statetransition failures can be generated using similar test-plan structures and programs as those used for
security testing of other TCB areas, e.g., testing security policy enforcement and covert channel
bandwidth.
Discontinuities of operation can be generated using administrative interfaces in ways causing at least
some TCB, and possibly media, failures repeatedly. In contrast, spontaneous TCB and media failures
cannot be regenerated without software and hardware instrumentation of the TCB. Use of such
instrumentation would violate one of the major objectives of security testing. Recall that TCB
instrumentation is undesirable because either it precludes testing the system in normal mode
configuration or it leaves test fixtures that may become exploitable Trap Doors in the TCB. Therefore,
security testing of trusted recovery functions is limited to the use of test plans, i.e., test conditions, data,
and coverage analysis, which cover only state- transition failures and discontinuity of operation, as
defined in Chapter 2. Within this limited context, all conventional security-testing requirements and
recommendations are applicable.
5.2.2 Design Specification and Verification
Inherent inability to define formal models of TCB failures and discontinuities of operations (viz., Chapter 2
of this guideline and reference [21]), and general lack of formal models of trusted facility management
and administrative roles, makes the TCSEC requirement for top-level specification correspondence with
the formal policy model irrelevant to trusted recovery. However, the requirement for use of a security
policy model, and more precisely for a state-machine model, is relevant to trusted recovery in two areas.
First, state-machine models enable designers and implementors to define the notions of secure system
states and state transitions. These notions are the key to trusted recovery as they provide the security
invariants and constraints recovery mechanisms should satisfy. Recovery functions earn their trust only if
they satisfy these invariants and constraints, as discussed in Chapter 4.
Second, the response of TCB primitives to state-transition failures (defined in Chapter 2) is modeled by
formal security-policy models and specified by top-level specifications. For example, the Bell-La Padula
model represents a clear, albeit incomplete, attempt to model these failures through the provision of the
"error" and "?" elements of the TCB response set (Dm) to invocations of TCB requests (Rk) [3]. Thus, the
specification of error messages and exceptions provided by TCB primitives in response to state-transition
failures also is required for trusted recovery reasons.
5.2.3 Configuration Management
All configuration management requirements of classes B3 and A1 apply as stated.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (29 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
5.2.4 Trusted Distribution
All trusted distribution requirements of class A1 apply to the TCB functions and interfaces implementing
trusted recovery as stated.
5.3 DOCUMENTATION
Most documentation requirements of the classes B3 and A1 apply to trusted recovery as stated in each
evaluation class. However, some requirements, such as those stating the need for a Security Features
Users' Guide (SFUG) and for covert channel documentation, are obviously not relevant. The SFUG is
relevant for non-administrative users whereas trusted recovery is exclusively a responsibility of system
administrators. The administrators are implicitly trusted not to disclose classified and proprietary
information they can obtain from the system directly without having to use covert channels.
5.3.1 Trusted Facility Manual
The Trusted Facility Manual (TFM) requirements are not only relevant but important to trusted recovery.
The TFM must include the description of procedures necessary "to resume secure system operation after
any lapse of system operation." Thus the TFM should include a description of the types of TCB failures
and discontinuities of operation and a list of procedures, tools, warnings, and examples of how these
failures might be best handled.
All TCB recovery procedures must be defined in the TFM. These procedures include analyzing system
"dumps" after crashes, crash-recovery and restart actions, checking the consistency of TCB files and
directories, changing system configuration parameters (e.g., table sizes, devices and device drivers, etc.),
running periodic system-integrity checks, and repairing object inconsistencies and damaged labels. A list
of the approved tools for TCB recovery, relevant commands, exceptions, warnings, and advice also should
be in the TFM.
5.3.2 Test Documentation
The trusted recovery testing documentation consists of test plan, test program, and test result
documentation. The general structure of the trusted-recovery test plan and test results is the same as
that of all other test plans and results. For example, the test plans should contain a test condition section,
a test data section (i.e., including a test environment setup, test parameters, and expected test
outcomes), and test coverage analysis.
However, the content of these sections should differ substantially from that of other test plans. For
example, the test conditions should identify the type of discontinuity of operation (and the induced TCB or
media failure) generated by using the administrative interfaces for the current test. In the test data area,
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (30 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
the environment setup should define the system initialization data, including TCB and user-level data
structures and objects, which are necessary to generate the specified discontinuity of operation. The
parameters and the commands used by administrators to generate discontinuity of operation also should
be listed.
The outcomes of the test should include the specification of the automated (e.g., reboot, warm-start)
procedures and of the manual (e.g., cold-start, emergency restart) procedures for trusted recovery and
their expected effects on the system.
The coverage analysis should explain the scope of the tests in terms of the classes of discontinuities
covered by the test and the classes of spontaneous failures remaining uncovered because of inability to
induce them by administrative action.
5.3.3 Design Documentation
The documentation of the trusted-recovery design should include the following items corresponding to the
B3 and A1 requirements of the TCSEC:
a. Description of the anticipated classes of failures and discontinuities of operation handled,
automatically or using administrative procedures, by trusted recovery.
b. Trusted recovery philosophy (e.g., use of failure-atomicity in the design of TCB primitives, of
non-atomic actions which allow recovery of secure states, the type of recovered secure statesinput-secure state, output- secure state of a transition, or some arbitrary secure state).
c. Warnings about the "unanticipated" failures that can't be handled in a routine manner.
d. State-security invariants and constraints maintained by trusted recovery.
e. Descriptive Top-Level Specification (DTLS) of the TCB primitives implementing trusted-recovery
functions.
The accuracy of the design documentation should be commensurate with that of other similar
documentation for B3 and A1 systems. In this area there are no substantive differences between the B3
and A1 requirements. This is true for the same reasons as those discussed in the design specification and
verification area.
6 SATISFYING THE TCSEC REQUIREMENTS
In the TCSEC, there are no requirements for Trusted Recovery for security classes below class B3.
Furthermore, security policy and accountability requirements which also may apply to trusted recovery are
already included in the requirements for trusted facility management [24]. This chapter includes only
additional requirements and recommendations specific to trusted recovery.
6.1 REQUIREMENTS FOR SECURITY CLASS B3
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (31 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
6.1.1 Operational Assurance
6.1.1.1 System Architecture
The TCB programs and data structures implementing trusted recovery must meet the following
requirements:
a. Satisfy modularity requirements.
b. Make significant use of abstraction, information hiding, and layering in the design and
implementation of trusted recovery functions.
6.1.1.2 Trusted Facility Management
Trusted recovery functions shall be assigned exclusively to administrative personnel with security-relevant
responsibility, e.g., System Programmer or Security Administrator roles [24].
6.1.2 Life-Cycle Assurance
6.1.2.1 Security Testing
Security testing requirements of class B3 apply to the functions and interfaces of the TCB for user-induced
failures (i.e., for state-transition failures as defined in Chapter 2 of this guideline), and to functions and
interfaces of administrative roles but only for discontinuities generated by administrative personnel. See
discussion in Section 5.2.
6.1.2.2 Design Specification and Verification
DTLSs of the TCB functions and interfaces implementing trusted recovery must be maintained that
completely and accurately describe these functions and interfaces in terms of exceptions, error messages,
and effects.
a. A formal security model should be used to define the TCB response to state-transition failures
(defined in Chapter 2 and discussed in Section 5.2).
b. A formal security model should be used for the derivation of the security policy invariants and
constraints used for the design of trusted recovery.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (32 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
c. Additional invariants and constraints should be used for the design of trusted recovery in the
accountability area as needed.
6.1.2.3 Configuration Management
All configuration management requirements of class B3 apply to trusted recovery as stated.
6.1.3 Documentation
6.1.3.1 Trusted Facility Manual
The following items should be included in the trusted recovery section of the Trusted Facility Manual:
a. Procedures for analysis of system dumps, for consistency checking of TCB objects, and for
system cold start and emergency restart.
b. A description of the types of tolerated failures and examples of the recommended procedures
for responding to such failures.
c. Procedures for running periodic integrity checks on the TCB database and for repairing damaged
security labels.
d. Procedures for handling inconsistencies of the system objects (e.g., duplicate allocation of disk
blocks to objects, inconsistent object links).
e. Lists of commands, system calls, and function definitions for trusted recovery (whenever these
aren't documented in the system's DTLS).
f. Examples of, and warnings about, potential misuse of trusted recovery procedures.
6.1.3.2 Test Documentation
All test documentation requirements of class B3, except those for covert channel testing (viz., Section
5.1), apply to the TCB functions and interfaces implementing trusted recovery as stated. The test plans
for trusted recovery should include the following:
a. Test conditions; i.e., a list of discontinuities of operation that can be generated through
administrative interfaces and their effects on the system.
b. Test data, consisting of the following:
(1) Environment setup; e.g., the TCB and user-level data structures and objects needed to
generate the planned discontinuity.
(2) Parameters and commands used by the administrators to generate the discontinuity.
(3) Expected outcome; e.g., the type of procedures that are started automatically or
manually for handling the generated discontinuity and the effect of those procedures on the
system state.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (33 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
c. Coverage analysis; e.g., this includes a list of failures, or classes of failures, whose effect is
covered by the generated discontinuities, and a list of spontaneous failures, or classes of failures,
whose effect isn't covered by the test.
6.1.3.3 Design documentation
Documentation shall describe the following:
a. Interfaces between the TCB modules implementing trusted recovery functions
b. Specific TCB protection mechanisms used ensuring trusted-recovery functions are available only
to administrative users.
c. DTLS of the TCB modules implementing interfaces of trusted recovery; (Formal Top Level
Specifications (FTLS) aren't required for trusted recovery interfaces---viz., relevant discussion in
[24] for administrative interfaces).
Design documentation also should include a description of the following:
a. Anticipated classes of failures and discontinuities of operation handled by trusted recovery,
automatically or using administrative procedures.
b. Trusted recovery philosophy; viz., Section 5.3.
c. Warnings concerning the "unanticipated" (i.e., rare) failures that can't be handled in a routine
manner.
d. State-security invariants and constraints maintained by trusted recovery.
e. DTLS of the TCB primitives implementing trusted-recovery interfaces.
The accuracy of the design documentation should be commensurate with that of other similar
documentation for B3 and A1 systems.
6.2 ADDITIONAL REQUIREMENTS OF SECURITY CLASS A1
All requirements of the security class B3 are included here. The only additional requirements are in the
following life-cycle assurance areas.
6.2.1 Additional Life-Cycle Assurance Requirements
6.2.1.1 Configuration Management
All additional configuration management requirements of class A1 apply as stated.
6.2.1.2 Trusted Distribution
All trusted distribution requirements of class A1 apply to the TCB functions and interfaces implementing
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (34 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
trusted recovery as stated.
GLOSSARY
ACCESS
A specific type of interaction between a subject and an object that results in the flow of information from
one to the other.
ADMINISTRATOR
See Security Administrator.
APPROVAL/ACCREDITATION
The official authorization that is granted to an ADP system to process sensitive information in its
operational environment, based upon comprehensive security evaluation of the system's hardware,
firmware, and software security design, configuration, and implementation and of the other system
procedural, administrative, physical, TEMPEST, personnel, and communications security controls.
AUDIT
To conduct the independent review and examination of system records and activities.
AUDITOR
An authorized individual, or role, with administrative duties, which include selecting the events to be
audited on the system, performing system operations to enable the recording of those events, and
analyzing the trail of audit events.
AUDIT MECHANISM
The device, or devices, used to collect, review, and/or examine system activities.
AUDIT TRAIL
A chronological record of system activities that is sufficient to enable the reconstruction, reviewing, and
examination of the sequence of environments and activities surrounding or leading to an operation, a
procedure, or an event in a transaction from its inception to final results.
CATEGORY
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (35 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
A restrictive label that has been applied to classified or unclassified data as a means of increasing the
protection of the data and further restriCting access to the data.
CRASH
A system failure that causes the processors' registers to be reset to some standard values.
DATA
Information with a specific physical representation.
DESCRIPTIVE TOP-LEVEL SPECIFICATION (DTLS)
A top-level specification that is written in a natural language (e.g., English), an informal program design
notation, or a combination of the two.
DISCRETIONARY ACCESS CONTROL (DAC)
A means of restricting access to objects based on the identity and need-to- know of the user, process and/
or groups to which they belong, or based on the possession of system-protected tickets that contain
privileges for objects (e.g., capabilities). The controls are discretionary in the sense that a subject with a
certain access permission is capable of passing that permission (perhaps indirectly) on to any other
subject.
FORMAL SECURITY POLICY MODEL
A mathematically precise statement of a security policy. To be adequately precise, such a model must
represent the initial state of a system, the way in which the system progresses from one state to another,
and a definition of a "secure" state of the system. To be acceptable as a basis for a TCB, the model must
be supported by a formal proof that if the initial state of the system satisfies the definition of a "secure"
state and if all assumptions required by the model hold, then all future states of the system will be
secure. Some formal modeling techniques include: state transition models, denotational semantics
models, and algebraic specification models.
FORMAL TOP-LEVEL SPECIFICATION (FTLS)
A top-level specification that is written in a formal mathematical language to allow theorems showing the
correspondence of the system specification to its formal requirements to be hypothesized and formally
proven.
IDEMPOTENT ACTIONS
An ordered list of actions (e.g., procedure calls, etc.) is said to be idempotent if repeated incomplete
executions of that list of actions followed by a complete execution has the effect of a single complete
execution of that list of actions. An idempotent action is a restartable action; i.e., if the action was in
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (36 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
progress at the time of a crash, the action can be repeated during crash recovery with no undesirable side
effects [14,21].
OBJECT
A passive entity that contains or receives information. Access to an object potentially implies access to the
information it contains. Examples of objects are: records, blocks, pages, segments, files, directories,
directory trees, programs, bits, bytes, words, fields, processors, video displays, keyboards, clocks,
printers, and network nodes.
OPERATOR
An administrative role or user assigned to perform routine maintenance operations of the ADP system and
to respond to routine user requests.
PASSWORD
A protected/private character string used to authenticate an identity.
PROCESS
A program in execution.
READ
A fundamental operation that results only in the flow of information from an object to a subject.
SECURITY ADMINISTRATOR
An administrative role or user responsible for the security of an Automated Information System and
having the authority to enforce the security safeguards on all others who have access to the Automated
Information System (with the possible exception of the Auditor.)
SECURITY LEVEL
The combination of a hierarchical classification and a set of non-hierarchical categories that represents the
sensitivity of information.
SECURITY MAP
A map defining the correspondence between the binary and ASCll formats of security levels (e.g.,
between binary format of security levels and sensitivity labels).
SECURITY POLICY
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (37 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
The set of laws, rules, and practices that regulate how an organization manages, protects, and distributes
sensitive information.
SECURITY POLICY MODEL
A presentation of the security policy model enforced by the system. It must identify the set of rules and
practices that regulate how a system manages, protects, and distributes sensitive information.
SECURITY TESTING
A process used to determine that the security features of a system are implemented as designed. This
includes hands-on functional testing, penetration testing, and verification.
SUBJECT
An active entity, generally in the form of a person, process, or device, that causes information to flow
among objects or changes the system state. Technically, a process/domain pair.
SYSTEM PROGRAMMER
An administrative role or user responsible for the trusted system distribution, configuration, installation,
and non-routine maintenance.
TOP-LEVEL SPECIFICATION (TLS)
A non-procedural description of system behavior at the most abstract level; typically, a functional
specification that omits all implementation details.
TRAP DOOR
A hidden software or hardware mechanism that can be triggered to permit system protection mechanisms
to be circumvented. It is activated in some innocent-appearing manner (e.g., a special "random" key
sequence at a terminal). Software developers often introduce trap doors in their code to enable them to
re-enter the system and perform certain functions. Synonymous with back door.
TROJAN HORSE
A computer program with an apparently or actually useful function that contains additional (hidden)
functions that surreptitiously exploit the legitimate authorizations of the invoking process to the detriment
of security. For example, making a "blind copy" of a sensitive file for the creator of the Trojan Horse.
TRUSTED COMPUTING BASE (TCB)
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (38 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
The totality of protection mechanisms within a computer system---including hardware, firmware, and
software ---the combination of which is responsible for enforcing a security policy. A TCB consists of one
or more components that together enforce a unified security policy over a product or system. The ability
of a TCB to enforce correctly a unified security policy depends solely on the mechanisms within the TCB
and on the correct input by system administrative personnel of parameters (e.g., a user's clearance level)
related to the security policy.
USER
Person or process accessing an Automated Information System either by direct connections (i.e., via
terminals), or indirect connections (i.e., prepare input data or receive output that is not reviewed for
content or classification by a responsible individual).
VERIFICATION
The process of comparing two levels of system specification for proper correspondence (e.g., security
policy model with top-level specification, TLS with source code, or source code with object code). This
process may or may not be automated.
WRITE
A fundamental operation that results only in the flow of information from a subject to an object.
BIBLIOGRAPHY
[1] Bach, M. J., The Design of the UNIX Operating System, Prentice-Hall Inc., Englewood Cliffs, New
Jersey, 1986.
[2] Baldwin, R. W., Rule-Based Analysis of Computer Security, Massachusetts Institute of Technology,
Cambridge, Massachusetts, Technical Report MlT/LCS/TR-401, March 1988.
[3] Bell, D. E., and L. J. La Padula, Secure Computer System: Unified Exposition and Multics
Interpretation, MlTRE Corp., Bedford, Massachusetts, Rep. No. MTR-2997, 1976. Available as NTlS ADA023 588.
[4] Benzel, T. V., and Travilla, D. A., "Trusted Software Verification: A Case Study," Proceedings of the
IEEE Symposium on Security and Privacy, Oakland, California, April 1985, pp. 14-30.
[5] Bina, E. J., and P. A. Emrath, "A Faster fsck for BSD UNIX," in Proceedings of the USENIX Conference,
San Diego, California, February 1989, pp. 173- 185.
[6] Chang, A., M. Mergen, S. Porter, R. Rader, and J. Roberts, "Evolution of Storage Facilities in the AX
System," in IBM Risc System/6000 Technology, SA23-261 9, IBM Corporation, Austin Communications
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (39 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
Department, 11400 Burnet Road, Austin, TX 78758, pp. 138---142.
[7] Chang, A., and M. Mergen, "801 Storage: Architecture and Programming," ACM Transactions on
Computer Systems, vol. 6, no. 2, February 1988, pp.28-50.
[8] Cristian, F., "Correct and Robust Programs," IEEE Transactions on Software Engineering, SE-1 0/2,
March 1984, pp. 163-174.
[9] Cristian, F., "A Rigorous Approach to Fault-Tolerant Programming," IEEE Transactions on Software
Engineering, SEll/I, January 1985, pp. 23-31.
[10] Department of Defense, Security Requirements for Automated Information Systems (AISs), DoD
Directive 5200.28,21 March 1988.
[11] Gasser, M., Building A Secure Computer System, Van Nostrand Reinhold, New York, 1988.
[12] Gligor, V. D., J. C. Huskamp, S. R. Welke, C. J. Linn, W. T. Mayfield, Traditional Capability-Based
Systems: An Analysis of their Ability to Meet the Trusted Computer Security Evaluation Criteria, Institute
for Defense Analyses, Alexandria, VA. IDA Paper P-1935, February 1987; available as NTlS AD-B119332.
[13] Gligor, V. D., "A Note on the Denial-of-Service Problem," Proceedings of the 1983 IEEE Symposium
on Security and Privacy, Oakland, California, April 1983, pp. 5101-5111.
[14] Gray, J. N., "Notes on Database Operating Systems," in Operating Systems---An Advanced Course, R.
Bayer, R. M. Graham, and G. Seegmuller, eds., Springer-Verlag, New York, 1978, pp. 393~81. Also
published as IBM Research Report RJ 2188, February 1978.
[15] Gray, J. N., Paul McJones, Mike Blasgen, Bruce Lindsay, Raymond Lorie, Tom Price, Franco Putzolu,
and Irving Traiger, "The Recovery Manager of the System R Database Manager," Computing Surveys,
13/2, June 1981, pp.223-242.
[16] Hecht, M. S., and Gabbe, J. D., "Shadowed Management of Free Disk Pages with a Linked List," ACM
Transactions on Database Systems, 8/4, December 1983, pp. 503-514.
[17] National Computer Security Center, Department of Defense Trusted Computer System Evaluation
Criteria, DOD 5200.28-STD, December 1985.
[18] IBM Corp., "Information Management System/Virtual Systems (MS/VS), Programming Reference
Manual," IBM Form No. SH20-9027-2, Section 5.
[19] IBM Corp., Secure Xenix, Version 1.1---System Administrators Guide, June 1987.
[20] Israel, J., J. Mitchell, and H. Sturgis, "Separating Data from Function in a Distributed File System,"
Proceedings of the Second International Symposium on Operating Systems, IRlA, Rocquencourt, France,
October 1978.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (40 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
[21] Lampson, B. W., "Atomic Transactions," in Distributed Systems ---an Advanced Course, B. W.
Lampson, M. Paul, and H. J. Siegert, eds., SpringerVerlag, New York, 1981, pp. 246-265.
[22] Lampson, B. W., Robert F. Sproull, "An Open Operating System for a Single User Machine," in
Proceedings of the Seventh Symposium on Operating Systems Principles, Pacific Grove, California,
December 1979, pp. 98-105.
[23] Mitchell, J. G, and J. Dion, "A Comparison of Two Network-Based File Servers," Communications of
the ACM, 25/4, April 1982, pp. 233-245.
[24] National Computer Security Center, A Guide to Understanding Trusted Facility Management, NCSCTG-01 5, version 1,18 October 1989.
[25] Paxton, W. H., "A Client-Based Transaction System to Maintain Data Integrity," Proceedings of the
Seventh Symposium on Operating Systems Principles, Pacific Grove, California, December 1979, pp. 18--23.
[26] Saltzer, J. H., "Protection and Control of Information Sharing in Multics," Communications of the
ACM, vol. 17, no. 8, July 1974, pp. 388---402.
[27] Swinehart, Daniel, Gene McDaniel Boggs, "WFS: a Simple Shared File System for a Distributed
Environment," Proceedings of the Seventh Symposium on Operating Systems Principles, Pacific Grove,
California, December 1979, pp. 9-17.
[28] Walker, S. T., "The Advent of Trusted Computer Operating Systems,"
National Computer Conference Proceedings, May, 1980, pp. 655---665.
[29] Walter, K. J., W. F. Ogden, W. C. Pounds, F. T. Bradshaw, S. R. Ames, K. J. Biba, J. M. Gilligan, D. D.
Schaefer, S. l. Schaen, D. G. Shumway, Modeling the Security Interface, Technical Report, Case Western
Reserve, University, Cleveland, Ohio, August 1974.
Share this article
Receive all the latest articles by email!
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (41 of 42) [11/27/2008 8:03:36 AM]
NCSC-TG-022: Yellow book
Receive Real-Time & Monthly WindowSecurity.com article updates in your mailbox. Enter your email
below!
Click for Real-Time sample & Monthly sample
Enter Email
Become a WindowSecurity.com member!
Discuss your security issues with thousands of other network security experts. Click here to join!
About Us : Email us : Product Submission Form : Advertising Information
WindowsSecurity.com is in no way affiliated with Microsoft Corp. *Links are sponsored by advertisers.
Copyright © 2008 TechGenix Ltd. All rights reserved. Please read our Privacy Policy and Terms &
Conditions.
http://www.windowsecurity.com/whitepapers/NCSCTG022_Yellow_book_.html?printversion (42 of 42) [11/27/2008 8:03:36 AM]
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement