Barrelfish specification Tech note

Barrelfish specification Tech note
Barrelfish Specification
Andrew Baumann
Simon Peter
Timothy Roscoe
Adrian Schüpbach
Akhilesh Singhania
Revision 862 of 2012-05-05
Acknowledgements
Paul, Rebecca, Tim, et al.
2
Contents
1. Introduction
7
2. Barrelfish Kernel API
2.1. System Calls . . . . . . . . . .
2.1.1. Invoke . . . . . . . . . .
2.1.2. Yield . . . . . . . . . .
2.1.3. Debug system calls . . .
2.2. Dispatch and Execution . . . . .
2.2.1. Disabled . . . . . . . .
2.2.2. Register save areas . . .
2.2.3. Dispatcher Entry Points
2.3. Inter-Dispatcher Communication
2.3.1. Endpoints . . . . . . . .
2.3.2. Message Transfer . . . .
2.3.3. Capability transfer . . .
2.3.4. Interrupt delivery . . . .
2.3.5. Exception delivery . . .
2.4. Virtual Memory . . . . . . . . .
2.5. Initial Address Space . . . . . .
2.5.1. User-Space Perspective .
2.5.2. Kernel Perspective . . .
2.6. Scheduling . . . . . . . . . . .
2.7. TODO . . . . . . . . . . . . . .
3. Barrelfish Library API
3.1. Capabilities . . . . . . . . . .
3.1.1. Data types . . . . . . .
3.1.2. Functions . . . . . . .
3.1.3. Invocations . . . . . .
3.1.4. Syscalls . . . . . . . .
3.2. VSpace management . . . . .
3.3. Dispatch and threading . . . .
3.4. Spawning domains . . . . . .
3.4.1. Initial capability space
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A. Glossary
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
8
8
9
9
9
11
12
12
13
13
14
14
14
14
14
17
17
17
.
.
.
.
.
.
.
.
.
19
19
19
19
19
19
19
19
19
19
21
3
B. Implementation
B.1. Mapping Database . . . . . .
B.1.1. Implementation details
B.1.2. Invariants . . . . . . .
B.1.3. Current Limitations . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
23
24
24
C. Architecture-Specific Features
C.1. x86-64 . . . . . . . . . . . . . . .
C.1.1. VSpace . . . . . . . . . .
C.1.2. IO capabilities . . . . . .
C.1.3. Interrupts and Exceptions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
25
25
26
.
.
.
.
4
List of Tables
2.1. Dispatcher control structure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
10
List of Figures
2.1. Dispatcher state save areas . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2. Typical dispatcher states . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3. init’s initial capability space layout . . . . . . . . . . . . . . . . . . . . . .
11
12
16
3.1. initial capability space layout of user tasks . . . . . . . . . . . . . . . . . . . .
20
6
1. Introduction
Barrelfish is...
7
2. Barrelfish Kernel API
2.1. System Calls
2.1.1. Invoke
This system call takes at least one argument, which must the address of a capability in the
caller’s CSpace. The remaining arguments, if any, are interpreted based on the type of this first
capability.
Other than yielding, all kernel operations including IDC are provided by capability invocation,
and make use of this call. The possible invocations for every capability type are described in the
capability management document.
This system call may only be used while the caller is enabled. The reason is that the caller
must be prepared to receive a reply immediately and that is only possible when enabled, as it
requires the kernel to enter the dispatcher at the IDC entry point.
2.1.2. Yield
This system call yields the CPU. It takes a single argument, which must be either the CSpace
address of a dispatcher capability, or CPTR_NULL. In the first case, the given dispatcher is run
unconditionally; in the latter case, the scheduler picks which dispatcher to run.
This system call may only be used while the caller is disabled. Furthermore, it clears the
caller’s disabled flag, so the next time it will be entered is at the run entry point.
2.1.3. Debug system calls
The following debug system calls may also be supported, depending on build options, but are
not part of the regular kernel interface.
No-op
This call takes no arguments, and returns directly to the caller. It always succeeds.
Print
This call takes two arguments: an address in the caller’s vspace, which must be mapped, and a
size, and prints the string found at that address to the console. It may fail if any part of the string
is not accessible to the calling domain.
8
Reboot
This call unconditionally hard reboots the system. [This call should be removed -AB]
Debug
[TODO: document me]
2.2. Dispatch and Execution
A dispatcher consists of code executing at user-level and a data structure located in pinned
memory, split into two regions. One region is only accessible from the kernel, the other region is
shared read/write between user and kernel. The fields in the kernel-defined part of the structure
are described in Table 2.1.
Beyond these fields, the user may define and use their own data structures (eg. a stack for the
dispatcher code to execute on, thread management structures, etc).
2.2.1. Disabled
A dispatcher is considered disabled by the kernel if either of the following conditions is true:
• its disabled word is non-zero
• its program counter is within the range specified by the crit_pc_low and crit_pc_high fields
The disabled state of a dispatcher controls where the kernel saves its registers, and is described
in the following subsection. When the kernel resumes a dispatcher that was last running while
disabled, it restores its machine state and resumes execution at the saved instruction, rather than
upcalling it at an entry point.
2.2.2. Register save areas
The dispatcher structure contains enough space for three full copies of the machine register
state to be saved. The trap_save_area is used whenever the dispatcher takes a trap, regardless
of whether it is enabled or disabled. Otherwise, the disabled_save_area is used whenever the
dispatcher is disabled (see above), and the enabled_save_area is used in all other cases.
Figure 2.1 (Trap and PageFault states have been left out for brevity) shows important dispatcher states and into which register save area state is saved upon a state transition. The starting
state for a domain is “notrunning” and depicted with a bold border in the Figure.
Arrows from right to left involve saving state into the labeled area. Arrows from left to right
involve restoring state from the labeled area. It can be seen that no state can be overwritten.
The kernel can recognize a disabled dispatcher by looking at the disabled flag, as well as the
domain’s instruction pointer. Nothing else needs to be examined.
The dispatcher states are also depicted in Figure 2.2.
9
Table 2.1.: Dispatcher control structure
Field name
Size
Kernel R/W
Short description
disabled
word
R/W
haswork
pointer
R
crit_pc_low
pointer
R
crit_pc_high
pointer
R
entry points
4 function descriptors
R
enabled_save_area
arch specific
W
disabled_save_area
arch specific
R/W
trap_save_area
arch specific
W
recv_cptr
capability pointer
R
recv_bits
word
R
If non-zero, the kernel will not
upcall the dispatcher, except to
deliver a trap.
If non-zero, the kernel will
consider this dispatcher eligible
to run.
Address of first instruction in
dispatcher’s critical code
section.
Address immediately after last
instruction of dispatcher’s
critical code section.
Functions at which the
dispatcher code may be invoked
Area for kernel to save register
state when enabled
Area for kernel to save and
restore register state when
disabled
Area for kernel to save register
state when a trap or a pagefault
while disabled occurs
Address of CNode to store
received capabilities of next
local IDC into
Number of valid bits within
recv_slot
word
R
recv_cptr
10
Slot within CNode to store
received capability of next local
IDC into
enabled_save_area
notrunning
disabled
disabled_save_area
running
disabled
notrunning
enabled
enabled_save_area
running
enabled
enabled_save_area
Figure 2.1.: Dispatcher state save areas. Trap and PageFault states omitted for brevity. Regular
text and lines denote state changes by the kernel. Dashed lines and italic text denote
state changes by user-space, which do not necessarily have to use the denoted save
area. The starting state is in the bold node.
2.2.3. Dispatcher Entry Points
Unless restoring it from a disabled context, the kernel always enters a dispatcher at one of the
following entry points. Whenever the kernel invokes a dispatcher at any of its entry points, it
sets the disabled bit on. One (ABI-specific) register always points to the dispatcher structure.
The value of all other registers depends on the entry point at which the dispatcher is invoked,
and is described below.
The entry points are:
Run A dispatcher is entered at this entry point when it was not previously running, the last
time it ran it was either enabled or yielded the CPU, and the kernel has given it the CPU.
Other than the register that holds a pointer to the dispatcher itself, all other registers are
undefined. The dispatcher’s last machine state is saved in the enabled_save_area .
PageFault A dispatcher is entered at this entry point when it suffers a page fault while enabled.
On entry, the dispatcher register is set, and the argument registers contain information
about the cause of the fault. Volatile registers are saved in the enabled_save_area ; all other
registers contain the user state at the time of the fault.
PageFault_Disabled A dispatcher is entered at this entry point when it suffers a page fault
while disabled. On entry, the dispatcher register is set, and the argument registers contain
information about the cause of the fault. Volatile registers are saved in the trap_save_area ;
all other registers contain the user state at the time of the fault.
Trap A dispatcher is entered at this entry point when it is running and it raises an exception
(for example, illegal instruction, divide by zero, breakpoint, etc.). Unlike the other entry
points, a dispatcher may be entered at its trap entry even when it was running disabled.
The machine state at the time of the trap is saved in the trap_save_area , and the argument
registers convey information about the cause of the trap.
LRPC A dispatcher is entered at this entry point when an LRPC message (see below) is delivered to it. This can only happen when it was not previously running, and was enabled. On
11
run
Preempt
schedule()
resume()
notrunning
Preempt
idc_local()
running
syscall()
resume()
Preempt
idc
Figure 2.2.: Typical dispatcher states. Trap and PageFault states omitted for brevity. Regular
text and lines denote state changes by the kernel. Dashed lines and italic text denote
state changes by user-space. The starting state is in bold.
entry, four registers are delivered containing the message payload, one stores the endpoint
offset, and another contains the dispatcher pointer.
This diagram shows the states a dispatcher can be in and how it gets there. The exceptional
states Trap and PageFault have been omitted for brevity.
2.3. Inter-Dispatcher Communication
Inter-dispatcher communication (IDC) is a kernel-supported mechanism to allow dispatchers to
communicate by sending messages. IDC is executed by invoking an IDC endpoint capability
referring to a receiving dispatcher.
2.3.1. Endpoints
IDC communication takes place between dispatchers via endpoints. An endpoint is created
by retyping a dispatcher capability into an IDC endpoint capability. It refers to exactly one
dispatcher, and to one endpoint buffer structure within that dispatcher. An endpoint buffer is
a kernel-specified data structure located within the dispatcher frame, where the kernel delivers
IDC messages.
12
The kernel guarantees messages to either be delivered to the receiving dispatcher or to return
to the sender with an error status code in the event that the receiver is unable to receive the
message. This implies that messages are never dropped silently by the kernel but does not
guarantee that messages are never dropped on the whole communication path, which involves
the receiving dispatcher.
It should be noted that endpoint capabilities may be freely copied, and do not uniquely identify
a sender. An endpoint capability can be transferred to several dispatchers, all of whom may use
the same endpoint and thus the same buffer when sending messages.
2.3.2. Message Transfer
To send IDC, a dispatcher invokes an endpoint capability to the receiving dispatcher. The message it wishes to send is provided as argument to the invocation, as well as flags to specify
additional parameters influencing the message transfer.
An IDC message is delivered by the kernel allocating space in the receiver’s endpoint buffer,
and writing the message contents. The receiver must poll its endpoints to detect incoming
messages, and consume them in order to free space in the endpoint buffer for new messages.
[TODO: detail!]
A dispatcher that executes an IDC invocation is considered to have yielded the CPU while
enabled. Therefore, the next time it is entered may be either at the Run or LRPC entry points.
2.3.3. Capability transfer
IDC can also be used to transfer capabilities from the sending dispatcher’s domain to a receiving
dispatcher’s domain. [TODO: document cap transfer!]
Flags
If the “sync” flag is set and the message transfer succeeds, the kernel will immediately dispatch
the receiver. Effectively, the sender yields to the receiver.
If the “yield” flag is set, and the message fails for one of the following reasons:
• the receiver’s message buffer is full
• the sender specified a capability, but it cannot be delivered because the receiver’s capability receive slot is non-empty
. . . then the kernel will also immediately dispatch the receiver (without performing a message
transfer). Again, in this case the sender effectively yields to the receiver.
LRPC
[TODO: document!]
In this mode of IDC, the kernel performs a controlled context switch from the sending to the
receiving dispatcher, preserving the capability invocation register state which is used to deliver
13
the message. The sender dispatcher is not blocked, however it implicitly donates the remainder
of its timeslice to the receiver.
If the receiving dispatcher is disabled, and the “yield” flag was set, the kernel sets the return
register in the sending dispatcher’s enabled_save_area to SYS_ERR_TARGET_DISABLED. The
kernel then switches to and resumes the target dispatcher. In effect, an LRPC operation when
the target is disabled becomes a directed yield of the CPU to the target dispatcher. If the “yield”
flag was not set, the kernel simply returns the same error code to the sender and runs the sender.
2.3.4. Interrupt delivery
Hardware interrupts are delivered by the kernel as asynchronous IDC messages to a registered
dispatcher. A dispatcher can be registered as for a specific IRQ by invoking the IRQTable
capability, passing it an IDC endpoint to the dispatcher and the IRQ number. It is not possible
for multiple IDC endpoints to be registered with the same IRQ number at any one time.
Henceforth, the kernel will send an IDC message using asynchronous delivery to the registered endpoint. Asynchronous IDC is used as it does not cause priority inversion by directly
dispatching the target dispatcher.
Refer to Appendix C for more information about valid hardware interrupts for an architecturespecific implementation of Barrelfish.
2.3.5. Exception delivery
When a CPU exception happens in user-space, it is reflected to the dispatcher on which it appeared. Page faults are dispatched to the pagefault entry point of the dispatcher. All other
exceptions are dispatched to the trap entry point of the dispatcher. The disabled flag of the
dispatcher is ignored in all cases and state is saved to the trap save area.
2.4. Virtual Memory
[TODO: Our memory model is based on capabilities and is quite similar to seL4.]
2.5. Initial Address Space
Our address space initialization is similar to the one of seL4, but we do not follow their boot
protocol to the word. Here is our version:
We have a special program called init that is run by the kernel after bootup as an ELF64
executable. In order to function, it has to receive some information by the kernel. We show
first how it receives this information from its ( init ’s) own perspective and then how the kernel
gathers and transmits this information to init .
2.5.1. User-Space Perspective
init ’s virtual address space size at startup is at most 4 MBytes (the amount of pagetable kernel
memory left for it), mapped as 4K pages, starting from 0x200000 (2 Meg). init ’s text/data
14
Listing 2.1: bootinfo structure
struct bootinfo {
// Base address of small memory caps region
capaddr_t
small_untyped_base;
// Number of small memory caps
size_t
small_untyped_count;
// Base address of large memory caps region
capaddr_t
large_untyped_base ;
// Number of large memory caps
size_t
large_untyped_count ;
// Number of entries in regions array
size_t
regions_length ;
// Memory regions array
struct mem_region regions [MAX_MEM_REGIONS];
};
Listing 2.2: mem_region structure
struct mem_region {
paddr_t
size_t
enum region_type
uint64_t
};
base;
size ;
type ;
data ;
//
//
//
//
Address of the start of the region
Size of region in bytes
Type of region
Additional data, based on region type
segments should be aligned consecutively and start at 0x400000 (4 Meg), leaving it 2 MBytes for
its text and data.
We have a bootinfo structure, shown in Listing 2.1.
This structure is mapped into init ’s virtual memory at address 0x200000 (2 Meg) and is at most
a 4K page in size. small_untyped_base points to the capability to the CNode, holding a number
(given by small_untyped_count) of small untyped capabilities. These can be used for easy setup
of init’s own address space. large_untyped_base and large_untyped_count is similar for (much)
larger untyped capabilities. Their sizes can be found in the regions array, of size regions_length
entries. An entry is defined by a mem_region struct, shown in Listing 2.2.
Its fields should be self-explanatory. The possible region types are defined by enum region_type
, shown in Listing 2.3.
These are the same as those in seL4.
Initial Capability Address Space
init ’s initial CSpace is shown in Figure 2.3.
15
0...0 (20 bits)
taskcn
0x0 NULL
0x1 DCB
DCB
rootcn
rootcn
dispatcher
0x0 taskcn
0x2 rootcn
0x4 dispframe
0x5 IRQTable
0x1 pagecn
0x6 IO
0x2 smallcn
0x7 BootInfo
0x3 supercn
0x8 Kernel
0x4 segcn
...
0x5 phyaddrcn
...
pagecn
0x0 PML4
0x1 PDPT
...
PDIR
...
PTABLE
...
segcn
0x0 .text
...
.data
...
Multiboot
...
Figure 2.3.: init’s initial capability space layout
16
Listing 2.3: region_type enumeration
enum region_type {
RegionType_Empty,
RegionType_InitCaps,
RegionType_RootTask,
RegionType_Device,
RegionType_CapsOnly
};
//
//
//
//
//
Empty memory
init’s caps mapped here
Code/Data of init itself
Memory-mapped device
Kernel-reserved memory
2.5.2. Kernel Perspective
In the following, ’cn’ will be short for CNode, init_dcb is short for init ’s DCB, replyep is short
for init ’s system call reply endpoint. The kernel sets up init ’s domain as follows:
• It allocates physical pages for: rootcn, taskcn, smallcn, supercn, init_dcb , replyep.
• Map bootinfo, init_dcb , replyep, rootcn, pml4, pdpt, pdir and ptables in that order.
• Allocate 64 physical pages and put untyped caps to them into smallcn.
• Map taskcn, smallcn, supercn, in that order.
• Setup init ’s DCB.
• Load init ELF64 binary into memory, map memory and allocate caps.
• Add all remaining memory as untyped caps to power of two large regions into supercn.
They may not be more than 64.
• Fill bootinfo struct.
• schedule () init.
2.6. Scheduling
Upon reception of a timer interrupt, the kernel calls ‘schedule()‘, which selects the next dispatcher to run. At the moment, a simple round-robin scheduler is implemented that walks a
circual singly-linked list forever.
2.7. TODO
• virtual machine support
• timers
• resource management
17
• thread migration
• event tracing / performance monitoring
18
3. Barrelfish Library API
[TODO: documentation of libbarrelfish]
3.1. Capabilities
3.1.1. Data types
cap_info cnode_info
get_cap_valid_bits get_cap_addr get_cnode_valid_bits get_cnode_addr build_cnode_info ?
3.1.2. Functions
cap_copy cap_mint cap_retype cnode_create cnode_create_raw ?
ram_alloc
async_endpoint_create local_endpoint_create
3.1.3. Invocations
invoke_*
3.1.4. Syscalls
syscall sys_yield cap_invoke cap_invoke_wait
3.2. VSpace management
struct vnode struct vlist
vspace_alloc vspace_map vspace_map_raw ? vspace_free vspace_alloc_map vspace_map_attr
vspace_map_attr_raw ?
3.3. Dispatch and threading
3.4. Spawning domains
3.4.1. Initial capability space
The initial capability space of other domains is similar, but lacks the other cnodes in the root
cnode, as illustrated in Figure 3.1.
19
taskcn
0x0 NULL
0...0 (20 bits)
DCB
rootcn
rootcn
dispatcher
0x0 taskcn
0x1 DCB
0x2 rootcn
0x1 pagecn
0x4 dispframe
0x2 smallcn
0x5 IRQTable
...
0x6 IO
0x4 segcn
0x7 BootInfo
0x8 Kernel
0x9 VMM request EP
0xa self EP
0xb Args frame
0xc Init EP
...
pagecn
0x0 PML4
0x1 PDPT
...
PDIR
...
PTABLE
...
Figure 3.1.: initial capability space layout of user tasks
20
A. Glossary
Capability Every kernel object is represented by a capability, allowing the user who holds
that capability to manipulate it. We use partitioned capabilities: capabilities are stored in
memory accessible only to the kernel, and are manipulated or invoked through the use of
addresses in the CSpace.
CSpace The capability address space, in which all capabilities reside, is constructed and managed by user-space code through a hierarchy of page table-like structures, called CNodes. The protection domain of user code is determined by the capabilities existing in its
CSpace.
VSpace The virtual address space
Dispatcher Kernel-scheduled entity, responsible for scheduling/managing the execution of
user code. Dispatchers are identified by DCB capabilities. Every dispatcher has a CSpace
and VSpace pointer, which determine its protection domain and virtual address space.
Multiple dispatchers may share a CSpace or VSpace.
Domain Although not directly part of the Barrelfish kernel API, the word domain is used to
refer to the user-level code sharing a protection domain and (usually) an address space. A
domain consists of one or more dispatchers.
DCB Dispatcher control block, the kernel object associated with a dispatcher, and therefore
also one of the system’s capability types. [I’d prefer to avoid this term, as it can be
confusing. -AB]
IDC Inter-dispatcher communication, the kernel-mediated message-passing primitive. There
are two types of IDC: the general case of asynchronous IDC, and an optimised local IDC
variant possible only when both the sender and receiver execute on the same core.
Endpoint A type of capability that, when invoked, performs an IDC. There are two endpoint
types (asynchronous and local) to match the two types of IDC.
Channel A uni-directional kernel-mediated communication path between dispatchers. All
messages travel over channels. Holding a capability for a channel guarantees the right
to send a message to it (although the message may not be sent for reasons other than
protection).
Mapping Database The mapping database is used to facilitate retype and revoke operations.
A capability that is not of type dispatcher, can only be retyped once. The mapping database
facilitates this check.
21
When a capability is revoked, all its descendants and copies are deleted. The mapping
database keeps track of descendants and copies of a capability allowing for proper execution of a revoke operation.
Each core has a single private mapping database. All capabilities on the core must be
included in the database.
Descendant A capability X is a descendant of a capability A if:
• X was retyped from A,
• or X is a descendant of A1 and A1 is a copy of A,
• or X is a descendant of B and B is a descendant of A,
• or X is a copy of X1 and X1 is a descendant of A.
Ancestor A capability A is an ancestory of a capability X if X is a descendant of A.
22
B. Implementation
This chapter covers the implementation and algorithm of some subsystems.
B.1. Mapping Database
This section describes the mapping database is more detail. It covers the algorithms including
implementation details and invariants on the database.
B.1.1. Implementation details
The database implements the following functions:
• is copy Checks if two capabilities are copies of each other. Two capabilities are copies if
they are of the same type and they refer to same kernel object. PhysAddr, RAM, Frame,
DevFrame, CNode, Dispatcher, Kernel, EndPoint Capability types explicitly reference
kernel objects so capabilities of such types can be tested simply. We cannot handle other
capability types yet, comparing two VNode, we always return false and comparing two
IO or IRQTable, we always return true.
• is ancestor Checks if one capability is a parent of another. In our current implementation,
some capability types cannot have descendants and some capability types cannot have
ancestors. For the rest, we check if the parent child relationship is possible based on the
retyping type allowed and check if the kernel object the child refers is inclusive in the
range of kernel objects the parent refers to.
• has descendants Checks if a capability has any descendants Walks the entire database
checking if the capability has any descendants. The function returns true when the first
descendant is found and if a capability other than a copy is found, it returns false.
• has copies Checks if a capability has any copies Walks the entire database checking if the
capability has any copies. The function returns true when the first copy is found and if a
capability other than a descendant is found, it returns false.
• insert after Inserts a set of contiguous capabilities after the given capability
• insert before Inserts a set of contiguous capabilities before the given capability
• set init mapping Inserts a capability into the database in the appropriate location. If any
copies or ancestors of the capability exist, the capability is inserted after a copy or after
the closest ancestor. If no relatives exist in the database, the capability is inserted at the
top of the database.
23
• remove mapping Removes the capability from the database
B.1.2. Invariants
Some invariants on the database that must be true at all times.
1. The next and prev pointers on a capability are never NULL.
2. There is only one database per core. Any capability on a core can be reach from another
on the core.
3. Two separate databases do not share any capabilities. The set of next and prev pointers on
one database is disjoint from the set of next and prev pointers on another.
4. Each capability on a core is on the database of the core. A capability will eventually be
visited by starting at any other capability and walking the database.
5. The database is circular. Walking in either direction from any capability, the same capability will eventually be reached again.
6. The head of the database cannot have any ancestors.
B.1.3. Current Limitations
The database has the following limitations.
1. It does not handle VNode capabilities and other types. Other types are not as crucial, but
VNode will become a priority shortly.
2. No indexing for quickly inserting brand new capabilities. The implementation starts at the
head of the database and traverses the entire database till an appropriate location is found.
3. Not necessarily a limitation but an important note. The current implementation can report
certain capabilities as descendants when one could have been created by copying another
and report a descendant as a copy when it was created by retyping the ancestor. This
requires the implementation of some functions to test for relationships in the correct order.
This may lead to some unforeseen issues later
24
C. Architecture-Specific Features
This chapter covers features specific to one implementation of Barrelfish on a specific hardware
architecture.
C.1. x86-64
The x86-64 implementation of Barrelfish is specific to the AMD64 and Intel 64 architectures.
This text will refer to features of those architectures. Those and further features can be found
in [2] and [1] for the Intel 64 and AMD64 architectures, respectively.
C.1.1. VSpace
The page table is constructed by copying VNode capabilities into VNodes to link intermediate
page tables, and minting Frame / DeviceFrame capabilities into leaf VNodes to perform mappings.
When minting a frame capability to a VNode, the frame must be at least as large as the smallest
page size. The type-specific parameters are:
1. Access flags: The permissible set of flags is PTABLE_GLOBAL_PAGE | PTABLE_ATTR_INDEX
| PTABLE_CACHE_DISABLED | PTABLE_WRITE_THROUGH. Access flags are set
from frame capability access flags. All other flags are not settable from user-space (like
PRESENT and SUPERVISOR).
2. Number of base-page-sized pages to map: If non-zero, this parameter allows the caller
to prevent the entire frame capability from being mapped, by specifying the number of
base-page-sized pages of the region (starting from offset zero) to map.
C.1.2. IO capabilities
IO capabilities provide kernel-mediated access to the legacy IO space of the processor. Each IO
capability allows access only to a specific range of ports.
The Mint invocation (see ??) allows the permissible port range to be reduced (with the lower
limit in the first type-specific parameter, and the upper limit in the second type-specific parameter).
At boot, an IO capability for the entire port space is passed to the initial user domain. Aside
from being copied or minted, IO capabilities may not be created.
25
C.1.3. Interrupts and Exceptions
Interrupts
The lower 32 interrupts are reserved as CPU exceptions. Thus, there are 224 hardware interrupts,
ranging from IRQ number 32 to 255.
The kernel delivers an interrupt that is not an exception and not the local APIC timer interrupt
to user-space. The local APIC timer interrupt is used by the kernel for preemptive scheduling
and not delivered to user-space.
Exceptions
The lower 32 interrupts are reserved as CPU exceptions. Except for a double fault exception,
which is always handled by the kernel directly, an exception is forwarded to the dispatcher
handling the domain on the CPU on which it appeared.
Page faults (interrupt 14) are dispatched to the ‘pagefault‘ entry point of the dispatcher. All
other exceptions are dispatched to the ‘trap‘ entry point of the dispatcher.
26
Bibliography
[1] Advanced Micro Devices. AMD64 Architecture Programmer’s Manual, September 2007.
[2] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual, September 2008.
27
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement