Secure Virtualization and Multicore Platforms State-of-the

Secure Virtualization and Multicore Platforms State-of-the
SICS Technical Report
T2009:14A
ISSN: 1100-3154
Secure Virtualization and Multicore Platforms State-of-the-Art report
by
Heradon Douglas and Christian Gehrmann
Swedish Institute of Computer Science
Box 1263, SE-164 29 Kista, SWEDEN
___________________________________________________________
2009-12-08
Secure Virtualization and Multicore Platforms
State-of-the-art report1
Heradon Douglas and Christian Gehrmann
SICS
Table of Contents
1.
Introduction ............................................................................................................................. 9
2.
Virtualization technologies .................................................................................................... 10
2.1.
What is virtualization? ................................................................................................... 10
2.2.
Virtualization Basics ...................................................................................................... 10
2.2.1.
2.3.
1
Interfaces ................................................................................................................. 10
2.2.1.1.
Instruction Set Architecture (ISA) ................................................................... 11
2.2.1.2.
Device drivers .................................................................................................. 11
2.2.1.3.
Applicatoin Binary Interface (ABI) ................................................................. 11
2.2.1.4.
Application Programming Interface (API) ..................................................... 11
2.2.1.5.
Interfaces, abstraction, and virtualization ........................................................ 12
Types of virtualization ................................................................................................... 12
2.3.1.
Process virtualization .............................................................................................. 12
2.3.2.
System virtualization .............................................................................................. 13
2.3.3.
ISA translation ........................................................................................................ 14
2.3.4.
Paravirtualization .................................................................................................... 15
2.3.5.
Pre-virtualization..................................................................................................... 15
We gratefully acknowledge the support of the Swedish Governmental Agency for Innovation Systems,
VINNOVA, under grant 2009-01753.
2009-12-08
2.3.6.
2.4.
3.
Non-standard systems .................................................................................................... 16
Hypervisors ............................................................................................................................ 17
3.1.
Traditional hypervisors .................................................................................................. 17
3.1.1.
4.
5.
Containers ............................................................................................................... 16
Protection rings and modes ..................................................................................... 18
3.2.
Hosted hypervisors ......................................................................................................... 18
3.3.
Microkernels................................................................................................................... 19
3.4.
Thin hypervisors ............................................................................................................. 19
Advantages of System Virtualization .................................................................................... 21
4.1.
Isolation .......................................................................................................................... 21
4.2.
Minimized trusted computing base ................................................................................ 21
4.3.
Architectural flexibility .................................................................................................. 21
4.4.
Simplified development ................................................................................................. 22
4.5.
Management ................................................................................................................... 22
4.5.1.
Consolidation/Resource sharing ............................................................................. 22
4.5.2.
Load balancing and power management................................................................. 22
4.5.3.
Migration................................................................................................................. 22
4.6.
Security........................................................................................................................... 23
4.7.
Typical Virtualization Scenarios .................................................................................... 23
4.7.1.
Hosting center ......................................................................................................... 23
4.7.2.
Desktop ................................................................................................................... 23
4.7.3.
Service provider ...................................................................................................... 24
4.7.4.
Mobile/embedded ................................................................................................... 24
Hardware Support for Virtualization ..................................................................................... 26
5.1.
Basic virtualization requirements ................................................................................... 28
2009-12-08
5.2.
Challenges in x86 architecture ....................................................................................... 28
5.3.
Intel VT .......................................................................................................................... 29
5.3.1.
VT-x ........................................................................................................................ 29
5.3.2.
VT-d ........................................................................................................................ 31
5.4.
5.4.1.
CPU ......................................................................................................................... 33
5.4.2.
Memory ................................................................................................................... 34
5.4.3.
Migration................................................................................................................. 34
5.4.4.
I/O ........................................................................................................................... 34
5.5.
6.
AMD-V .......................................................................................................................... 33
ARM TrustZone ............................................................................................................. 35
Hypervisor-based security architectures ................................................................................ 38
6.1.
Advantages ..................................................................................................................... 38
6.2.
Virtualization security challenges .................................................................................. 38
6.3.
Architectural limitations................................................................................................. 40
6.3.1.
The semantic gap .................................................................................................... 40
6.3.2.
Interposition granularity.......................................................................................... 41
6.4.
Architectural patterns ..................................................................................................... 42
6.4.1.
Augmented traditional hypervisor .......................................................................... 42
6.4.2.
Security VM ............................................................................................................ 42
6.4.3.
Microkernel application .......................................................................................... 42
6.4.4.
Thin hypervisor ....................................................................................................... 43
6.5.
Isolation-based services.................................................................................................. 43
6.5.1.
Isolation architectures ............................................................................................. 43
6.5.2.
Kernel code integrity............................................................................................... 44
6.5.3.
Memory multi-shadowing ....................................................................................... 45
2009-12-08
6.5.4.
Protecting against a malicious OS .......................................................................... 46
6.5.5.
I/O Security ............................................................................................................. 46
6.5.6.
Componentization ................................................................................................... 47
6.5.7.
Mandatory Access Control (MAC) ......................................................................... 47
6.5.8.
Instruction set virtualization ................................................................................... 47
6.6.
6.6.1.
Attestation ............................................................................................................... 48
6.6.2.
Malware analysis .................................................................................................... 49
6.6.3.
Intrusion detection .................................................................................................. 50
6.6.4.
Forensics ................................................................................................................. 50
6.6.5.
Execution logging and replay ................................................................................. 50
6.7.
7.
Monitoring-based services ............................................................................................. 48
Alternatives .................................................................................................................... 51
Multicore systems .................................................................................................................. 52
7.1.
Why multicore? .............................................................................................................. 52
7.2.
Hardware considerations ................................................................................................ 53
7.2.1.
Core count and complexity ..................................................................................... 53
7.2.2.
Core heterogeneity .................................................................................................. 53
7.2.3.
Memory hierarchy................................................................................................... 54
7.2.4.
Interconnects (core communication)....................................................................... 54
7.2.5.
Extended instruction sets¨’...................................................................................... 54
7.2.6.
Other concerns ........................................................................................................ 55
7.3.
Software considerations ................................................................................................. 55
7.3.1.
Programming models .............................................................................................. 55
7.3.2.
Programming tools .................................................................................................. 56
7.3.3.
Locality ................................................................................................................... 57
2009-12-08
7.3.4.
7.4.
Interesting multicore architectures ................................................................................. 57
7.4.1.
The Barrelfish multikernel ...................................................................................... 57
7.4.2.
Configurable isolation ............................................................................................. 58
7.4.3.
Mixed-Mode Multicore (MMM) reliability ............................................................ 58
7.5.
Multicore and virtualization ........................................................................................... 58
7.5.1.
8.
Load-balancing and scheduling .............................................................................. 57
Multicore virtualization architectures ..................................................................... 59
7.5.1.1.
Managing dynamic heterogeneity ................................................................... 59
7.5.1.2.
Sidecore ........................................................................................................... 60
References ............................................................................................................................. 61
2009-12-08
Summary
Virtualization, the use of hypervisors or virtual machine monitors to support multiple virtual
machines on a single real machine, is quickly becoming more and more popular today due to its
benefits of increased hardware utilization and system management flexibility, and because of
increasing hardware and software support for virtualization in commodity platforms. With the
hypervisor providing an abstraction layer separating virtual machines from the real hardware,
and isolating virtual machines from each other, many useful architectural possibilities arise.
In addition to hardware utilization and system management, virtualization has been shown to be
a strong enabler for security -- both as a result of the isolation enforced by the hypervisor
between virtual machines, and due to the hypervisor's high-privilege suitability as a strong base
for security services provided for the virtual machines.
Additionally, multicore is quickly gaining prevalence, with all manner of systems shifting to
multicore hardware. Virtualization presents both opportunities and challenges with multicore
hardware -- while the layer of abstraction provided by the hypervisor affords a unique
opportunity to manage multicore complexity and heterogeneity beneath the virtual machines,
supporting multicore in the hypervisor in a robust and secure way is not a trivial task.
This report gives an overview of the state-of-the art regarding virtualization, multi-core systems
and security. The report is a major deliverable to the SVaMP project pre-study and will serve as
a basis for in-depth analysis of a selected set of multicore target systems in the second phase of
the project. Starting from the state-of-the art designs described in this report, the second phase of
the project will also identify design patterns and derive system models for secure virtualized
multicore systems.
2009-12-08
Abbreviations
ABI
Application Binary Interface
API
Application Programming Interface
ASID
Address Space Identifier
CPU
Central Processing Unit
DMA
Direct Memory Access
DMAC
DMA Controller
DMR
Dual-Modular Redundancy
DRM
Digital Rights Management
EPT
Extended Page Table
I/O
Input/Output
IOMMU
I/O Memory Management Unit
IPC
Interprocess Communication
ISA
Instruction Set Architecture
MAC
Mandatory Access Control
MMM
Mixed-Mode Multicore reliability
MMU
Memory Management Unit
NUMA
Non-Uniform Memory Architecture
SPMD
Single-Program, Multiple Data
TCB
Trusted Computing Base
TCG
Trusted Computing Group
TLB
Translation Lookaside Buffer
TPR
Task Priority Register
VBAR
Vector Base Address Register
2009-12-08
VM
Virtual Machine
VMCS
Virtual Machine Control Structure
VMI
VM introspection
VMM
Virtual Machine Monitor
VPID
Virtual Process Identifier
2009-12-08
1. Introduction
This report gives an overview of virtualization technologies and research recent research results
in the area. The purpose with the report is to give the foundation for the SVaMP project platform
analysis, requirements and modeling work.
The report is organized as follows. First, in Section 2, we give basic definitions regarding
virtualization and the technologies behind virtualization. Section 3 discusses different
hypervisor/virtual machine monitor architectures. In Section 4, we explain the major different
motivations for introducing virtualization in a system. Section 5 describes important
virtualization enabling hardware architectures. In Section 6, we discuss different hypervisor
protected software architectures. The focus is well known design and description of hypervisor
based platform security services. Finally, in Section 7, an overview of multicore systems and
issues are given and in particular we treat virtualization in relation to mutlicore systems.
2009-12-08
2. Virtualization technologies
2.1. What is virtualization?
Virtualization is a computer system abstraction, in which a layer of virtualization logic manages
and provides ``virtualized" resources to a client layer running above it. The client accesses
resources using standard interfaces, but the interfaces do not communicate with the resources
directly; instead, the virtualization layer manages the real resources and possibly multiplexes
them among more than one client.
The virtualization layer resides at a higher privilege level than the clients, and can interpose
between the clients and the hardware. This means that it can intercept important instructions and
events and handle them specially before they are executed or handled by the hardware. For
example, if a client attempts to execute an instruction on a virtual device, the virtualization layer
may have to intercept that instruction and implement it in a different way on the real resources in
its control. Each client is presented with the illusion of having sole access to its resources, thanks
to the management performed by the virtualization layer. The virtualization layer is responsible
for maintaining this illusion and ensuring correctness in the resource multiplexing. Virtualization
therefore promotes efficient resource utilization via sharing among clients, and furthermore
maintains isolation between clients (who need not know of each other's existence). Virtualization
also serves to abstract the real resources to the client, which decouples the client from the real
resources, facilitating greater architectural flexibility and mobility in system design.
For these reasons, virtualization technology has become more prominent, and its viable uses
have expanded. Today virtualization is used in enterprise systems, service providers, home
desktops, mobile devices, and production systems, among other venues.
Oftentimes, the client in a virtualization system is known as the guest.
2.2. Virtualization Basics
2.2.1. Interfaces
An excellent overview of virtual machines is found here [79], and in a book by the same authors
([80]). The article discusses, in part, how virtualization can be understood in terms of the
interfaces present at different levels of a typical computer system. Interfaces offer different levels
of abstraction which clients use to access resources. Virtualization technology exposes an
expected interface, but behind the scenes is virtualizing resources accessed by the interface -- for
example, in the case of a disk input/output interface, the ``disk" that the interface provides access
to may actually be a file on a real disk when implemented by a virtualization layer. A discussion
of important interfaces in a typical computer system follows.
2009-12-08
2.2.1.1.
Instruction Set Architecture (ISA)
The ISA is the lowest level instruction interface that communicates directly with hardware.
Software may be interpreted by intermediaries, for example a Java Virtual Machine or .NET
runtime, or a script interpreter for scripting languages like Perl or Python, or it may be compiled
from a high-level programming language like C, and the software may utilize system calls that
execute code found in the operating system kernel, but in the end all software is executed
through the ISA. In a typical system, some of the ISA can be used directly by applications, but
another part of the ISA (usually that dealing with critical system resources) is only available to
the higher-privileged operating system. If unprivileged software attempts to use a restricted
portion of the ISA, the instruction will ``trap" to the privileged operating system.
2.2.1.2.
Device drivers
Device drivers are a software interface provided by device vendors to enable the operating
system to control devices (hard drives, graphics cards, etc.). Device drivers often reside in the
operating system kernel and run at high privilege, and are hence part of the trusted computing
base in traditional systems -- but as they are not always written with ideal security or robustness,
they constitute a dominant source of operating system errors [30].
2.2.1.3.
Applicatoin Binary Interface (ABI)
The ABI is the abstracted interface to system resources that the operating system exposes to
clients (applications). The ABI typically consists of system calls. Through system calls,
applications can obtain access to system resources mediated by the operating system. The
operating system ensures the access is permitted and grants it in a safe manner. The ABI can
remain consistent across different hardware platforms since the operating system handles the
particularities of the underlying hardware, thus exposing a common interface regardless of
platform differences.
2.2.1.4.
Application Programming Interface (API)
An API provides a higher level of abstraction than the ABI. Functionality is provided to
applications in the form of external code ``libraries" that are accessed using a function call
interface. This abstraction can facilitate a common interface for applications not only across
different hardware platforms (as with the ABI), but also across different operating systems, since
the API can be reimplemented as necessary for each ABI. Furthermore, APIs can be built on top
of other APIs, making it at least possible that only the lower-level APIs will have to be
reimplemented to be used on a new operating system. (In reality, however, depending on the
language used to implement the library, it doesn't usually work out so ideally.) As previously
mentioned, however, all software is executed through the ISA in the end -- meaning that any API
or application will have to be recompiled, even if it doesn't have to be reimplemented, as it
moves to a new platform.
2009-12-08
2.2.1.5.
Interfaces, abstraction, and virtualization
Each of these interface levels represents an opportunity for virtualization, since clients of an
interface depend only on the structure and behavior of the interface (also known as its contract),
and not its implementation. Here we see the idea of abstraction. Abstraction concerns providing
a convenient interface to clients, and can be understood as follows -- an application asking an
operating system for a TCP/IP network connection most likely does not care if the connection is
formed over a wireless link, a cellular radio, or an ethernet cable, or if TCP semantics are
achieved using other protocols, and it does not care about the network card model or the exact
hardware instructions needed to set up and tear down the connection. The operating system deals
with all these issues, and presents the application with a handle to a convenient TCP/IP
connection that adheres to the interface contract, but may be implemented under the surface in
numerous ways. Abstraction enables clients to use resources in a safe and easy manner, saving
time and effort for common tasks. Virtualization, however, usually means more than just
abstraction; it implies more about the nature of what lies behind the abstraction. A virtualization
layer not only preserves abstraction for its clients, but may also use intermediate structures and
abstractions between the real resources and the virtual resources it presents to clients [79] -- such
as using files on a real disk to simulate virtual disks, or using various resources and techniques
above the physical memory to simulate private address spaces. And it may multiplex resources
(such as the CPU) among multiple clients, presenting each client with a picture of the resource
corresponding to the client's own context, creating in effect more instances of the resource then
exist in actuality.
2.3. Types of virtualization
There are two most prominent basic types of virtualization -- process virtualization and system
virtualization [79]. Also noteworthy topics are binary translation, paravirtualization, and
previrtualization (approaches to system and process virtualization), as well as containers, a more
lightweight relative of system virtualization. These concepts illustrate basic types of
virtualization currently in use.
2.3.1. Process virtualization
Process-level virtualization is a fundamental concept in virtually every modern mainstream
computer system. In process virtualization, an operating system virtualizes the memory address
space, central processing unit (CPU), CPU registers, and other system resources for each running
process. Each process interacts with the operating system using a virtual ABI or API, unaware of
the activities of other processes [79].
The operating system manages the virtualization and maintains the context for each process. For
instance, in a context switch, the operating system must swap in the register values for the newly
2009-12-08
scheduled process, so that the process can begin executing where it left off. The operating system
typically has a scheduling algorithm to ensure that every process gets a fair share of CPU time,
thereby maintaining the illusion of sole access to the CPU. Through virtual memory, each
process has the illusion of its own independent address space, in which its own data and code as
well as system and application libraries are accessible. A process can't access the address space
of another process. The operating system achieves virtualization of memory through the use of
page tables, which translate the virtual memory pages in processes' virtual address space to
actual physical memory pages. To map a virtual address to a physical address, the operating
system conducts a ``page table walk" and finds the physical page corresponding to the virtual
page in question. In this way, different processes can even access the same system libraries in the
same physical locations, but in different virtual pages in their own address spaces. A process
simply sees a long array of bytes, whereas underneath, some or all of those bytes may be loaded
into different physical memory pages or stored in the backing store (usually on a hard drive).
Furthermore, a modern processor typically has multiple cache levels (termed the L1 cache, L2
cache, and so on) where recently or frequently used memory pages can be stored to enhance
retrieval performance -- the higher the level, the smaller the cache size but the greater the speed.
(A computer system memory hierarchy can often be visualized as a pyramid, with slower, lower
cost, higher capacity storage media at the bottom, and faster, higher cost, lesser capacity media at
the top.) And, a CPU typically also uses other specialized caches and chips, such as a Translation
Lookaside Buffer (TLB) that caches translations from virtual page numbers to physical page
numbers (that is, the results of page table walks). Virtual memory is thus the outward-facing
facade of a complex internal system of technologies.
In short, processes interact obliviously with virtual memory and other resources through standard
ABI and APIs, while the operating system manages the virtualization and multiplexing of
resources under the hood.
2.3.2. System virtualization
In contrast to process virtualization, in system virtualization an entire system is virtualized,
enabling multiple virtual systems to run isolated alongside each other [79]. A hypervisor or
Virtual Machine Monitor (VMM) virtualizes all the resources of a real machine, including CPU,
devices, memory, and processes, creating a virtual environment known as a Virtual Machine
(VM). Software running in the virtual machine has the illusion of running in a real machine, and
has access to all the resources of a real machine through a virtualized ISA. The hypervisor
manages the real resources, and provides them to the virtual machines. The hypervisor may
support one or more virtual machines, and thus is responsible for making sure all real machine
resources are properly managed and shared, and for maintaining the illusion of the virtual
resources presented to each virtual machine (so that each virtual machine ``thinks" it has its own
real machine).
2009-12-08
Note here that the VMM may divide the system resources in different ways. For instance, if there
are multiple CPU cores, it may allocate specific cores to specific VMs in a fixed manner, or it
may adopt a dynamic scheme where cores are assigned and unassigned to VMs flexibly, as
needed. (This is similar to how an operating system allocates the CPU to its processes via its
scheduling algorithm.) The same goes for memory usage -- portions of memory may be statically
allocated to VMs, or memory may be kept in a ``pool" that is dynamically allocated to and
deallocated from VMs. Static allocation of cores and memory is simpler, and results in stronger
isolation, but dynamic allocation may result in better utilization and performance [79].
Virtualization of this standard type has been around for decades, and is increasing quickly in
popularity today, thanks to the flexibility and cost-saving benefits it confers on organizations
[89], as well as due to commodity hardware support discussed in section 5. Note as well that it is
expanding from its traditional ground (the data center) and into newer areas such as security and
mobile/embedded applications [54].
2.3.3. ISA translation
If the guest and virtualization host utilize the same ISA, then no ISA translation is necessary.
Clearly, running the host and guest with the same ISA and thus not requiring translation is
simpler, and better for performance. Scenarios do arise, however, in which the guest uses a
different ISA than the host. In these cases, the host must translate the guest's ISA. Both process
and system virtualization layers can translate the ISA; a VMM supporting ISA translation is
sometimes known as a ``Whole System" VMM [79].
ISA translation can enable operating systems compiled for one type of hardware to run on a
different type of hardware. Therefore, it enables a software stack for one platform to be
completely transitioned to a new type of hardware. This may be quite useful. For example, if a
company requires a large legacy application but lacks the resources to port it to new hardware,
they can use a whole system VMM. Another example of the benefits of ISA translation might be
if an ISA has evolved in a new or branching CPU line, but older software should still be
supported -- systems such as the IA32 Execution Layer, or IA32-EL ([18]), which supports
execution of Intel IA-32 compatible software on Itanium processors, can be used. Alternatively,
if a company develops for multiple hardware platforms, whole-system VMMs can facilitate
multiple-ISA development environments consolidated on a single workstation. However, as
already mentioned, ISA translation will likely degrade performance.
A virtualization system may translate or optimize the guest ISA in different ways [79]. Through
interpretation, an emulator runs a binary compiled for one ISA by reading the instructions one
by one and translating them to a different ISA compatible with the underlying system. Through
dynamic binary translation, blocks of instructions are translated at once and cached for later,
resulting in higher performance than interpretation. Even if the guest and host run the same ISA,
2009-12-08
the virtualization layer may also seek to dynamically optimize the binary code, as in the case of
the HP Dyanmo system ([17]).
Binary translation may also be needed in systems where the hardware is not virtualizationfriendly; in these cases, the VMM can translate unsafe instructions from a VM into safe
instructions.
2.3.4. Paravirtualization
In relation to ISA translation, paravirtualization represents a different, possibly complementary
approach to virtualization. In paravirtualization, the guest code is modified to use a different
interface that is either safer or easier to virtualize, improves performance, or both. The interface
used by the modified guest will either access the hardware directly or use virtual resources under
the control of the VMM, depending on the situation, facilitating performance and reliability [89].
The Denali system uses paravirtualization in support of a lightweight, multi-VM environment
suited for networked application servers [100].
Paravirtualization comes, of course, at the cost of modifying the guest software, which may be
impossible or difficult to achieve and maintain. But in cases of well-maintained, open software
(such as Linux), paravirtualized software distributions may be conveniently available.
Like binary translation, paravirtualization can also serve in situations where underlying hardware
is not supportive of virtualization. The paravirtualization of the guest gives the VMM control
over all sensitive operations that must be virtualized and managed.
2.3.5. Pre-virtualization
Pre-virtualization, or transparent paravirtualization, as it is sometimes called, attempts to bring
the benefits of both binary translation (which offers flexibility) and paravirtualization (which
brings performance). Pre-virtualization is achieved via an intermediary between the guest code
and the VMM -- this intermediary can come in the form of either a standard, neutral interface
agreed on by VMM and guest OS developers, or an automated offline translation process such as
using a special compiler. Both are offered by the L4Ka implementation of the L4 microkernel -L4Ka supports the generic Virtual Machine Interface proposed by VMWare [92], and also
provides their Afterburner tool that compiles unmodified guest OS code with special notations
that enable it to run on a special, guest-neutral VMM layer [58].
Pre-virtualization aims to decouple the authoring of guest OS code from the usage of a VMM
platform, and thereby retain the security and performance enhancements of paravirtualization
without the ususal development overhead -- a neutral interface or offline compilation process
facilitate this decoupling. Pre-virtualization is a newer technique that bears watching.
2009-12-08
2.3.6. Containers
Containers are an approach to virtualization that runs above a standard operating system but
provides a complete, lightweight, isolated virtual environment for collections of processes [89].
An example is the OpenVZ project for Linux [65], or the system proposed in [81].
Applications running in the containers must run natively on the underlying OS -- containers do
not promote heterogeneous OS environments. But in such situations, containers can pose a lessresource intensive path to system isolation than traditional virtualization.
One must, however, observe that a container system is not a minimal trusted hypervisor, but
instead running as a part of what may be a monolithic OS; hence, any security ramifications in
the container system architecture and the isolation mechanisms must be considered.
2.4. Non-standard systems
The above discussion on the basics of virtualization has concerned itself with typical system
types, where layers of abstraction are used to expose higher and higher level interfaces to clients,
promoting portability and ease-of-use, and creating a hierarchy of responsibility based on
interface contracts. This common sort of architecture lends itself to virtualization. But it is worth
mentioning that there are other types of computer systems in existence they may be not so
amenable to virtualization. For instance, exokernels [37] take a totally different approach -instead of trying to abstract and ``baby-proof" a system with higher and higher level interfaces,
exokernels provide unfettered access to resources and allow applications to work out the details
of resource saftey and management for themselves. This yields much more control and power to
the application developer, but is more difficult and dangerous to deal with -- similar to the
difference between programming in C and Java.
2009-12-08
3. Hypervisors
The hypervisor or VMM is the layer of software that performs system virtualization, facilitating
the use of the virtual machine as a system abstraction as illustrated in Figure 1.
Figure 1: Typical VM software architecture
3.1. Traditional hypervisors
Traditional hypervisors, such as Xen [19] and VMWare ESX [93], run on the bare metal and
support multiple virtual machines. This is the classic type of hypervisor, dating back to the 1970s
[41], when they commonly ran on mainframes. A traditional hypervisor must provide device
drivers and any other components or services necessary to support a complete virtual system and
ISA for its virtual machines.
To virtualize a complete ISA and system environment, traditional hypervisors may use
paravirtualization, as Xen does, or binary translation, as VMWare ESX does, or a combination of
both, or neither, depending on such aspects as system requirements and available hardware
support.
The Xen hypervisor originally required paravirtualization, but can now support full virtualization
if the system offers modern virtualization hardware support (see section 5). Additionally, Xen
deals with device drivers in an interesting way. Instead of having all the device drivers included
in the hypervisor itself, it instead uses the device drivers running in the OS found in the special
high-privilege Xen administrative domain, sometimes known as Dom0 [29] (ch. 6). Dom0 runs
an OS with all necessary device drivers. The other guests have been modified, as part of the
2009-12-08
necessary paravirtualization, to use simple abstract device interfaces that the hypervisor then
implements through request and response communication with Dom0 and its actual device
drivers.
3.1.1. Protection rings and modes
In traditional hypervisor architecture, the hypervisor leverages a hardware-enforced security
mechanism known as privilege rings or protection rings, or the closely related processor mode
mechanism, to protect itself from guest VMs and to protect VMs from each other. The protection
ring concept was introduced in the Multics operating system in the 1970s [75]. With protection
rings, different types of code execute in different rings, with higher privilege code running in
higher rings (ring 0 being the highest), with only specific predefined gateway mechanisms able
to transfer execution from one ring to another. Processor modes function in a similar way. The
current mode is stored as a hardware flag, and only when in certain modes can particular
instructions execute. Transition between modes is a protected operation. For example, Linux and
Windows typically use two modes -- supervisor and user -- and only the supervisor mode can
execute hardware-critical instructions such as disabling interrupts, with the system call interface
enabling transition from user to supervisor mode [101]. Memory pages associated with different
rings or modes are protected from access by lower privilege rings or modes. Rings and modes
can be orthogonal concepts, coexisting to form a lattice of privilege state.
Following this pattern, the hypervisor commonly runs in the highest privilege ring or mode
(possibly a new mode above supervisor mode, such as a hypervisor mode), enabling it to oversee
the guest VMs and intercept and handle all important instructions affecting the hardware
resources that it must manage. This subject will be further discussed in section 5 on virtualization
hardware support.
3.2. Hosted hypervisors
A hosted hypervisor, such as VirtualBox[95] or VMWare Workstation [83][94], runs atop a
standard operating system and supports multiple virtual machines. The hypervisor runs as a user
application, and therefore so do all the virtual machines. Performance is preserved by having as
many VM instructions as possible run natively on the processor. Privileged instructions issued by
the VMs (for example, those that would normally run in ring 0) must be caught and virtualized
by the hypervisor, so that VMs don't interfere with each other or with the host. One potential
advantage of the hosted approach is that existing device drivers and other services in the host
operating system can be used by the hypervisor and virtualized for its virtual machines (as
opposed to the hypervisor containing its own device drivers), reducing hypervisor size and
complexity [79]. Additionally, hosted hypervisors often support useful networking
configurations (such as bridged networking, where each VM can in effect obtain its own IP
address and thereby network with each other and the host), as well as sharing of resources with
2009-12-08
the host (such as shared disks). Hosted hypervisors provide a convenient avenue for desktop
users to take advantage of virtualization.
3.3. Microkernels
Microkernels such as L4 [88] offer a minimal layer over the hardware to provide basic system
services, such as Interprocess Communication (IPC) and processes or threads with isolated
address spaces, and can serve as an apt base for virtualization [45]. (However, not everyone
agrees on that last point [16][42].) Microkernels typically do not offer device drivers or other
bulkier parts of a traditional hypervisor or operating system. To support virtualization, such
services are often provided by a provisioning application such as Iguana on L4 [62]. The virtual
machine runs atop the provisioning layer. Alternatively, an OS can be paravirtualized to run
directly atop the microkernel, as in L4Linux [57].
Microkernels can be small enough to support formal verification, providing formal assurance for
a system's Trusted Computing Base (TCB), as in the recently verified seL4 microkernel [53][63].
This may be of special interest to parties building systems for certification by the Common
Criteria [24], or in any domain where runtime reliability and security are mission-critical
objectives.
Microkernels can give rise to interesting architectures. Since other applications can be written to
run on the microkernel in addition to provisioned virtual machines, with each application running
in its own address space isolated by the trusted microkernel, a system can be built consisting of
applications and entire operating systems running side by side and interacting through IPC.
Furthermore, the company Open Kernel Labs ([64]) advertises an L4 microkernel-based
architecture where not only applications and operating systems, but also device drivers, file
systems, and other components can be run in isolated domains, and where device drivers running
in one operating system can be used by other operating systems via the mediation of the
microkernel. (This is similar to the device driver approach in Xen.)
3.4. Thin hypervisors
There is some debate as to what really constitutes a ``thin" hypervisor. How thin does it have to
be to be called thin? What functionality should it provide? VMWare ESXi, which installs
directly on server hardware and has a 32MB footprint [93], is advertised as an ultra-thin
hypervisor. But other hypervisors out there are considerably smaller, and one could argue that
32MB is still quite large enough to harbor bugs and be difficult to verify. The seL4 microkernel
has ``8,700 lines of C code and 600 lines of assembler" [53], and thus is quite a bit smaller while
still providing isolation (although not, in itself, capable of full virtual machine support).
SecVisor, a thin hypervisor intended to sit below a single OS and provide kernel integriy
protection, is even tinier, coming in at 1112 lines when proper CPU support for memory
virtualization is available [77] -- but of course, it offers still less functionality than seL4. This
2009-12-08
also indicates that the term ``hypervisor" is a superset of ``virtual machine monitor", including as
well architectures that provide but a thin monitoring and possibly ISA virtualization layer
between a guest OS and the hardware.
There are numerous thin hypervisor architectures in the literature, including the aforementioned
SecVisor [77] and BitVisor [78]. Like traditional hypervisors and microkernels, thin hypervisors
run on the bare metal. We will be most interested in ultra-thin hypervisors that monitor and
interpose between the hardware and a single guest OS running above it. This presents the
opportunity to implement various services without the guest needing to know, including security
services. Since ultra thin hypervisors are intended to be extremely small and efficient, they are
thus suitable for low cost, low resource computing environments such as embedded systems.
The issue of hardware support is especially relevant for ultra-thin hypervisors, since any
activities that can be handled by hardware relieve the hypervisor of extra code and complexity.
Since an ultra-thin hypervisor runs with such a bare-bones codebase, hardware support will be
instrumental in determining what it can do.
One interesting question is if it is possible to create an ultra-thin hypervisor that will run beneath
a traditional hypervisor/VMM, instead of beneath a typical guest OS, and thereby effectively
provide security services for multiple VMs but still with an extremely tiny footprint. It is also
interesting to consider the possibility of multicore support in a thin hypervisor, given the added
complexity yet increasing relevance and prevalence of multicore hardware.
Thin hypervisors will be discussed more later in the context of security architecture.
2009-12-08
4. Advantages of System Virtualization
Traditional system virtualization, by enabling entire virtual machines to be logically separated by
the hypervisor from the hardware they run on, creates compelling possibilities for system design.
Put another way, ``by freeing developers and users from traditional interface and resource
constraints, VMs enhance software interoperability, system impregnability, and platform
versatility." [79]. Virtualization yields numerous advantages, some of which are discussed in the
following sections.
4.1. Isolation
The fundamental advantage of virtualization is isolation between the virtual machines, or
domains, enforced by the hypervisor. (Domain is a more generic term than virtual machine, and
can capture any isolated domain, such as a microkernel address space.) This leads to robustness
and security.
It is worth mentioning nowadays that, instead of traditional pure isolation, virtualization is used
in architectures where virtual machines are intended to cooperate in some way (especially in
mobile and embedded platforms, discussed in a later section). Therefore it may be important for
the hypervisor to provide secure services for inter-VM communication, such as microkernel IPC.
4.2. Minimized trusted computing base
A user application depends on, or trusts, all the software running beneath it. A compromise in
any software beneath it on the stack, or in any other software that can compromise or control any
software on the stack, can compromise the application itself. In modern operating systems, where
software often runs with administrative privileges, a compromise of any piece of software can
result in total machine compromise and therefore be devastating to any other software running on
the machine. Such an architecture presents an immense attack surface -- the entire exposed
facade through which the attacker can approach the system. It could include user applications,
operating system interfaces, network services, devices and device drivers, etc.
Virtualization addresses this problem by placing a trustworthy hypervisor at the highest privilege
on the system and running virtual machines at reduced privilege. Software can be partitioned into
virtual machines that are trusted and untrusted, and a compromise of an untrusted VM will have
no effect on a trusted VM, since the hypervisor guards the gates, so to speak. Total machine
compromise now requires compromise of the hypervisor, which typically presents a much
slimmer attack surface than mainstream operating systems (although of course that varies in
practice). A slimmer attack surface means, in principle, that it is easier to protect correctly.
4.3. Architectural flexibility
2009-12-08
The decoupling of virtual and real renders a great deal of architectural flexibility. VMs can be
combined on a single platform arbitrarily to meet particular needs. In the case of whole-system
VMMs that translate the ISA, the flexibility even extends to running VMs on more than one type
of hardware, and combining VMs meant for more than one type of hardware on a single
platform.
4.4. Simplified development
Virtualization can lead to simplified software development and easier porting. As mentioned,
instead of porting an application to a new operating system, an entire legacy software stack can
simply run in a virtual machine, alongside other operating systems, on a single platform. In the
case of ISA translation, instead of targeting every hardware platform, a developer can write for
one platform, and rely on virtualization to extend support to other platforms.
In addition to reducing the need for porting and developing across platforms, virtualization can
also facilitate more productive development environments, for instance by enabling a
development or testing workstation to run instances of all target operating systems.
Another example is that when developing a system typically comprised of multiple separate
machines, system virtualization can be used to virtualize all these machines on a single machine
and connect them with a virtual network. This approach can also be used to facilitate product
demos of such systems -- instead of bringing all the separate machines to a customer, a laptop
hosting all the necessary virtual machines can be used to portably demonstrate system
functionality.
4.5. Management
The properties of virtualization result in many interesting benefits when it comes to system
management.
4.5.1. Consolidation/Resource sharing
Virtualization can increase efficiency in resource utilization via consolidation [44][54]. Systems
with lower needs can be run together on single machines. More can be done with less hardware.
Virtualization's effectiveness in reducing costs has been known for decades [41].
4.5.2. Load balancing and power management
In the same vein as consolidation, virtualization can be used to balance CPU load by moving
VMs off of heavily loaded platforms (load balancing), and can also be used to combine VMs
from lightly loaded machines onto fewer machines in order to power down unneeded hardware
(power management) [44][54].
4.5.3. Migration
2009-12-08
Virtual machines can be migrated live (that is, in the middle of execution) between systems.
Research has been done to support virtualization-based migration even on mobile platforms [84].
In theory, computing context could be migrated between any compatible device capable of
virtualization. Challenges include ensuring that a fully compatible environment is provided for
virtual machines in each system they migrate to (including a consistent ISA), so that execution
can be safely resumed. Besides further enabling the above mentioned management applications
of consolidation and load balancing, migration supports new scenarios where working context is
seamlessly transitioned between environments, such as for employees working in multiple
corporate offices, client sites, and travel in between.
4.6. Security
Last but definitely not least, virtualization can provide security advantages, and is moving more
and more in this direction [54]. Of course, these advantages are founded on the minimized TCB
and VM/VMM isolation mentioned earlier, the basic properties that make virtualization
attractive in secure system design. But building upon these foundational properties can lead to
substantial additional security benefit.
A hypervisor has great visibility into and control over its virtual machines, yet is isolated from
them, and thus forms an apt base for security services of many and varied persuasions. An
interesting aspect of virtualization-based security architecture is that it can bring security services
to unmodified guest systems, including commodity platforms.
By using virtualization in the creation of secure systems, designers can reap not only the bounty
of isolated domains, but additionally the harvest of whatever security services the hypervisor can
support. A later section will discuss virtualization-based security services in greater detail.
4.7. Typical Virtualization Scenarios
4.7.1. Hosting center
Hosting centers can use virtualization to provide systems for clients. Clients can share time on
virtualized systems with quality of service guarantees. Restricted to their own isolated domains,
clients are prevented from interfering with each other. This scenario sounds quite familiar to the
time-sharing mainframes of yesteryear, and indeed the scenarios bear resemblance. The hosting
center is a very typical virtualization use-case, where VMs are purely isolated and share
resources according to a local policy.
4.7.2. Desktop
Virtualization on the desktop is becoming much more common nowadays, which has inspired
(and is inspired by) progress in virtualization support in commodity desktop hardware [61]. In
corporations, especially development houses, virtualization is used to give engineers easy access
2009-12-08
to multiple target platforms. Another possible corporate scenario is enabling employees to have
virtual machines configured for different clients or workplace scenarios on one machine. With
VirtualBox freely available, even home users can cheaply leverage virtualization to access
multiple operating systems or partition their system into trusted and untrusted domains.
Virtualization gives desktop users the freedom to have all the heterogeneous computing
environments they need at their fingertips, without absorbing extra hardware cost.
4.7.3. Service provider
A service provider (such as a web service provider) may utilize virtualization to consolidate
resources or servers onto fewer hardware platforms. For instance, a web application may have a
front end web server and multiple back end tier servers, hosted as virtual machines on a single
physical machine.
4.7.4. Mobile/embedded
Lastly, a quickly emerging virtualization scenario is the mobile/embedded arena -- it is becoming
more and more common now to have mobile devices containing isolated domains entrusted with
different purposes [85], such as an employee smartphone containing isolated home and work
environments [54]. With processors shrinking in size and increasing in performance, growing
numbers of embedded systems have the power to support virtualization and leverage its benefits.
Embedded CPUs with multiple cores and/or built-in security/virtualization support, as in the
already discussed ARM Trustzone, further enhance possibilities.
Multiple companies are working in the mobile virtualization space, including Open Kernel Labs
2
, VirtualLogix 3, and now VMWare 4. It has been found to be not unduly onerous to port
virtualization architectures to mobile platforms [25], and open systems such as the L4
microkernel [88] and Xen on ARM [46][103] afford open, low-cost solutions.
Therefore, the benefits of virtualization already discussed can be brought to mobile systems, in
addition to enabling applications and benefits specific to the mobile/embeddded environment.
For example, due to the high frequency of hardware changes and the wide variety of available
platforms in embedded systems, virtualization can provide an especially convenient layer of
abstraction to facilitate application development. Applications could be distributed as an entire
software stack (including a specific OS) to run in a VM, and therefore not depend on any
particular ABI [44]. Isolated virtual machines can serve as mobile testbed components or nodes
in opportunistic mobile sensor networks [32], and support heterogeneous application
2
3
4
http://www.ok-labs.com/
http://www.virtuallogix.com/
http://www.vmware.com/technology/mobile/
2009-12-08
environments [44]. Modularity and live system migration is of special interest in the mobile
environment. Virtualization can also support mobile payment, banking, ticketing, or other similar
applications via isolated trusted components (as in TrustZone design tiers) -- for instance,
Chaum's vision of a digital wallet, with one domain controlled by the bank and one domain by
the user [27], could potentially be implemented with virtualization, enabling people to carry ``ecash" in their PDA or smartphone. And of course, beyond isolation, many aspects of security in
embedded scenarios may be served by virtualization, as will be discussed later.
2009-12-08
5. Hardware Support for Virtualization
Virtualization benefits from support in the underlying hardware architecture. If hardware is not
built with system virtualization in mind, then it can become difficult or impossible to implement
virtualization correctly and efficiently. Challenges can include virtualization of the CPU,
memory, and device input/output. For example, if a non-privileged CPU instruction (that is, a
portion of the ISA that non-privileged user code is still permitted to execute) can modify some
piece of privileged hardware state for the entire machine, then one virtual machine is effectively
able to modify the system state of another virtual machine. The VMM must prevent this breach
of consistency. In another common example relating to memory virtualization, standard page
tables are designed for one level of virtualized memory, but virtualization requires two -- one
layer for the VMM to virtualize the physical memory for the guest VMs, and one layer for the
guest VMs to virtualize memory for their own processes. Lacking hardware support for this
second level of paging can incur performance penalties, so called shadow page tables as
illustrated in Figure 2. In another example, regarding device I/O where devices use DMA to
write directly to memory pages, a VMM must ensure that devices being used by one VM are not
allowed to write to memory used by another VM. If the VMM must validate every I/O operation
in software, it can be expensive. There are many other potential issues with hardware and
virtualization, mostly centering around the cost and difficulty of trapping/intercepting and
emulating instructions and dealing with overhead from frequent context switches in and out of
the hypervisor and VMs whenever privileged state is accessed. It is important that hardware
contain mechanims for dealing with virtualization issues if virtualization is to be effectively and
reasonabley supported.
2009-12-08
Figure 2: Usage of shadow page tables
Without hardware support, VMMs can also rely on the aforementioned paravirtualization, in
which the source code of an operating system is modified to use a different interface to the
VMM that the VMM can virtualize safely and efficiently, or the already described binary
translation [61], in which the VMM translates unsafe instructions at runtime. Neither of these
solutions is ideal, since paravirtualization, while effective and often resulting in performance
enhancements, requires source-code level modification of an operating system (something not
always easy or possible), and translation, as stated earlier, can be resource intensive and
complicated. (Pre-virtualization could offer a better solution here.) Specifically regarding I/O
virtualization without hardware support, a VMM can emulate actual devices (so that device
instructions from VMs are intercepted and emulated by the VM, analagous to binary translation),
supporting existing interfaces, or it can provide specially crafted new device interfaces to its
VMs [49]. Emulating devices in a VM can be slow, and difficult to implement correctly, while
providing a new interface requires modification to a VM's device drivers and/or OS, which may
be inconvenient. Besides sidestepping these troubles, having hardware shoulder more of the
burden for virtualization support can simplify a hypervisor's code overall, further minimizing the
TCB, easing development, and raising assurace in security [61]. There are other software-based
solutions for enabling virtualization without hardware support, such as the ``Gandalf" VMM [50]
that attempts to implement lightweight shadow paging for memory management, but it is
unlikely that a software-based solution will be able to compete with a competent hardware-based
solution.
2009-12-08
5.1. Basic virtualization requirements
Popek and Goldberg outlined basic requirements for a system to support virtual machines in
1974 [69]. The three main requirements are summed up in a simple way in [2]:
1. Fidelity -- Also called equivalency, fidelity indicates that running software on a
virtual machine should result in identical results or behavior as running it on a real
machine (excepting time-related issues).
2. Performance -- Performance should be reasonably efficient, which is achieved by
having as many instructions as possible run natively, direct on the hardware, without
trapping to the VMM.
3. Safety -- The hypervisor or VMM must have total control over the virtualized
hardware resources.
Many modern hardware platforms were not designed to support virtualization and did not meet
the fidelity requirement out of the box, meaning that VMM software had to do extra work -negatively impacting the efficiency requirement. But today, CPUs are being built with more
built-in virtualization support, including chips by Intel and AMD, and are actually able to meet
Popek and Goldberg's requirements.
5.2. Challenges in x86 architecture
Intel x86 CPU architecture formerly offered no virtualization support, and indeed included many
issues that hindered correct virtualization (necessitating binary translation or paravirtualization).
As a common architecture, it is worth taking a closer look at some of its issues. Virtualization
challenges in Intel x86 architecture include (as described in [61]):
Certain IA-32 and Itanium instructions can reveal the current protection ring
level to the guest OS. Under virtualization, the guest OS will be running in a
lower-than-normal privilege ring. Therefore, being able to discern the current
ring breaks Popek and Goldberg's fidelity condition, and can reveal to the guest
that it is running in a virtual machine.
In general, if a guest OS is made to run at lower privilege than ring 0, issues
may arise if any portion of the OS was written expecting to be run in ring 0.
Some IA-32 and Itanium non-faulting instructions (that is, non-trapping, nonprivileged instructions) modify privileged CPU state. User-level code can
execute such instructions, and they don't trap to the operating system.
Therefore, VMs can issue non-trapping instructions that modify state affecting
other VMs.
IA-32 SYSENTER and SYSEXIT instructions, typically used to start and end
system calls, cause a trap to and exit from ring 0, respectively. If SYSEXIT is
called outside ring 0, it causes a trap to ring 0. With a VMM running at ring 0,
2009-12-08
SYSENTER and SYSEXIT will therefore trap to the VMM -- bon system call
entry (when the user application calls SYSENTER, trapping to ring 0) and exit
(when the guest OS not at ring 0 calls SYSEXIT, resulting in a trap to ring 0).
This creates additional overhead and complication for the VMM.
Activating and deactivating interrupt masking (for blocking of external
interrupts from devices) by the guest OS is a privileged action and may be a
frequent activity. Without hardware support, it could be costly for a VMM to
virtualize this functionality. This concern also applies to any privileged CPU
state that may be accessed frequently.
Also relating to interrupt masking, the VMM may have to deliver virtual
interrupts to a VM, but the guest OS may have masked interrupts. Some
mechanism is required to ensure prompt delivery of virtual interrupts from the
VMM when the guest deactivates masking.
Some aspects of IA-32 and Itanium CPU state are hidden -- meaning they are
inaccessible for reading and/or writing by software -- and it is therefore
impossible for a context switch between VMs to properly transition that state.
Intel CPUs typically contain four protection rings. The hypervisor runs at ring
0. In 64-bit mode, the paging-based memory protection mechanism doesn't
distinguish between rings 0-2; therefore, the guest OS must run at ring 3,
putting it at the same privilege level as user applications (and therefore leaving
the guest OS less protected from the applications running on it). This
phenomenon is known as ring compression.
Modern Intel and AMD CPUs offer hardware support to deal with these challenges. Prominent
aspects of hardware virtualization support include support for virtualization of CPU, memory,
and device I/O, as well as support for guest migration.
5.3. Intel VT
Intel Virtualization Technology (VT) is a family of technologies supporting virtualization on
Intel IA-32, Xeon, and Itanium platforms. It includes elements of support for CPU, memory, and
I/O virtualization, and guest migration.
Intel VT on IA-32 and Xeon is known as VT-x, whereas Intel VT for Itanium is known as VT-i.
Of those two, this document will focus on VT-x. Intel VT also includes a component known as
VT-d for I/O virtualization, discussed in later this section, and VT-c for enhancing virtual
machine networking, which is not discussed.
5.3.1. VT-x
Technologies under the VT-x heading include support for CPU and memory virtualization, as
well as guest migration.
2009-12-08
A foundational element of Intel VT-x's CPU virtualization support is the addition of a new bit of
CPU state, orthogonal to protection ring, known as VMX root operation mode [61]. (Intel VT-i
has a similar new bit -- the ``vm" bit in the processor status register, or PSR.) The hypervisor
runs in VMX root mode, whereas virtual machines do not. When executed outside VMX root
mode, certain privileged instructions will invariably trap to VMX root mode (and hence the
VMM), and other instructions and events (such as different exceptions) can also be configured to
trap to VMX root mode. Exit from VMX root mode is called a VM entry and entry to this mode
is called a VM exit. VM entries and exits are managed in hardware via a structure known as the
Virtual Machine Control Structure (VMCS). The VMCS stores virtualization-critical CPU state
for VMs and the VMM so that it can be correctly swapped in and out by hardware during VM
entries and exits, freeing VMM software from this burden. Note also that the VMCS contains
and provides access to formerly hidden CPU state, so that the entire CPU state can be virtualized.
The VMCS stores the configuration for which optional instructions and events will trap to VMX
root mode. This enables the VMM to ``protect" appropriate registers, handle certain instructions
and exceptions, handle activity on certain input/output ports, and other conditions. A set of CPU
instructions provides the VMM with configuration access to the VMCS.
Regarding interrupt masking and virtualization, the interrupt masking state of each VM is
virtualized and maintained in the VMCS. Further, VT-x provides a control feature whereby a
VMM can force traps on all external interrupts and prevent a VM from modifying the interrupt
masking state (and attempts by the guest to modify the state won't trap to the VMM). There is
also a feature whereby a VMM can request a trap if the VM deactivates masking [61]. Therefore,
if masking is active, the VMM can request a trap when masking is again deactivated -- and then
deliver a virtual interrupt.
Additionally, it is important to observe that since VMX root mode is orthogonal to protection
ring, a guest OS can still run at ring 0 -- just not in VMX root mode. This alleviates any
problems arising from a guest OS running at lower privilege but expecting to run at ring 0 (or
from a guest OS being able to detect that it isn't running in ring 0). It also solves the problem of
SYSENTER and SYSEXIT always faulting to the VMM and thus impacting system call
performance -- now, they will behave as expected, since the guest OS will run in ring 0.
Another salient element of VT-x's CPU virtualization support is hardware support for
virtualizing the Task Priority Register (TPR) [61]. The TPR resides in the Advanced
Programmable Interrupt Controller (APIC), and tracks the current task priority -- only interrupts
of higher priority priority will be delivered. An OS may require frequent access to the TPR to
manage task priority (and therefore interrupt delivery and performance), but a guest OS must not
modify the state for any other guest OSes, and trapping frequent TPR access in the VMM could
be expensive. Under VT-x, a virtualized copy of the TPR for each VM can be kept in the VMCS,
enabling the guest to manage its own task priority state -- and a VM exit will only occur when
the guest attempts to drop its shadow value below a threshold value also set in the VMCS [61].
2009-12-08
The VM can therefore modify, within set bounds, its TPR -- without trapping to the VMM. (This
technology is advertised as Intel VT FlexPriority.)
Moving on from virtualization of the CPU, Intel VT-x also now contains a feature called
Extended Page Tables (EPTs) [44], which support virtualization memory management. Standard
hardware page tables translate from virtual page numbers to physical page numbers. In
virtualization scenarios, use of these basic page tables requires frequent synchronization effort
for the VMM, since (as described in the beginning of section 5) the VMM needs to virtualize the
physical page numbers for each guest. The VMM must somehow maintain the physical
mappings for each guest VM. With EPTs, there are now two levels of page tables -- one page
tabe translates from ``guest virtual" to ``guest physical" page numbers for each VM, and a
second page table translates from ``guest physical" to the ``host physical" page numbers that
correspond to actual physical memory. In this way, a VM is free to access and use its own page
tables, mapping between the VM's own virtual and ``guest physical" addresses, in a normal way,
without needing to trap to the VMM -- resulting in performance savings.
However, EPTs do result in a longer page table ``walk" (a page table walk is the process of
``walking" though the page tables to find the physical address corresponding to a virtual
address), due to the second page table level. Therefore, if a process incurs many TLB misses,
necessitating many page table walks, performance could suffer. One possible solution to this
problem is to increase page size, which could reduce the number of TLB misses (depending on
the process's memory layout).
Another VT-x feature supporting memory virtualization is Virtual Process Identifier (VPID),
which enable a VMM to maintain a unique ID for each process running within the VMs (and for
its own process). TLB entries can then be tagged with a VPID, and therefore the TLB won't have
to be flushed (which is expensive) in VM entries and exits ([61]), since entries for different VMs
are distinguishable.
Finally, VT-x includes a component dubbed ``FlexMigration" that facilitates migration of guest
VMs among supporting Intel CPUs. Migration of guest VMs in a varied host pool can be
challenging, since guest VMs may query the CPU for its ID and thereafter expect the presence of
a certain instruction set, but then may be migrated to another system supporting slightly different
instructions. FlexMigration helps possibly heterogeneous systems in the pool to expose
consistent instruction sets to all VMs, thus enabling live guest migration.
5.3.2. VT-d
Device I/O uses DMA, enabling devices to write directly to memory pages without going
through the operating system kernel. (DMA for devices has been a source of security issues in
the past, with devices such as Firewire devices being able to write to kernel memory, even if
accessed by an unprivileged user. Attacks on the system via DMA are sometimes called ``attacks
2009-12-08
from below".) The problem with DMA for devices on virtualization platforms is that devices
being used by a guest shouldn't be allowed to access memory pages on the system belonging to
other guests or the VMM -- therefore, on traditional systems, all device I/O operations must be
checked with or virtualized by the VMM, thereby reducing performance. Hardware support can
enable guest associations and memory access permissions to be established for devices and
automatically checked for any I/O operation.
Intel VT for Directed I/O (also known as Intel VT-d) offers hardware support for device I/O on
virtualization platforms [49]. It provides several key features (as described in [49]):
Device assignment -- The hardware enables specification of numerous isolated
domains (which might correspond to virtual machines on a virtualization
platform). Devices can be assigned to one or more domains, so that they can only
be used by those domains. In particular, this allows a VM domain to use the
device without trapping to the VMM.
DMA remapping -- through use of I/O page tables, the pages included in each I/O
domain and the pages that can be accessed by each device can be restricted.
Furthermore, pages that devices write to can be logically remapped to other
physical pages. In I/O operations, the page tables are consulted to check if the
page in question may be accessed by the device in question on behalf of the
current domain. Different I/O domains are effectively isolated from each other.
Note that this feature is necessary to make device assignment safely usable -since it prevents a device assigned to one domain from accessing pages belonging
to another domain.
Interrupt remapping -- Device interrupts can be restricted to particular domains,
so that devices only issue interrupts to the domains that are expecting them.
DMA remapping offers a plethora of potential uses, both for standard systems with a single OS
and for VMMs with multiple VMs [49]. For standard systems, DMA remapping can be used to
protect the operating system from devices (by prohibiting device access to kernel memory
pages), and to partition system memory into different I/O domains to isolate the activity of
different devices. It can also be used on 64-bit systems to support legacy 32-bit devices that are
only equipped to write to a 4GB physical address space; the addresses the device writes to can be
remapped to higher addresses in the larger system address space (which would otherwise require
expensive OS-managed bounce buffers).
A VMM, on the other hand, might simply assign devices to domains (which will most likely
correspond to VMs), and devices will thereby be restricted to operating on any memory owned
by that domain (VM). As mentioned, this will also enable guest VMs (and their device drivers)
to interact with their assigned I/O devices without trapping to the VMM. Furthermore, the VMM
can assign devices to multiple domains to facilitate I/O sharing or communication. Finally, if the
VMM virtualizes the DMA remapping instructions for its VMs, then the guest VMs can use the
2009-12-08
remapping support in a similar way to an OS on a standard system -- protecting the OS, limiting
and partitioning the memory regions that a device can write to, and remapping regions for legacy
devices. To virtualize the remapping instructions and state, the VMM could maintain this state
(in an eagerly updated ``shadow copy" [49]) for each VM, by intercepting VM modification of
its I/O page tables and VM usage of the registers controlling the remapping. (Perhaps a future
hardware revision could provide built-in hardware support for virtualization of the remapping
facilities.)
The interrupt remapping component of VT-d can also be put to multiple uses by a VMM [49]. A
VMM can ensure that device-generated interrupts are routed only to the domains that the devices
are assigned to. It can also use the remapping hardware as a kind of ``interrupt firewall" to
ensure that external interrupts do not have characteristics that would cause them to be confused
with internal VMM interrupts. Finally, the interrupt remapping can be used to enable safe
migration of interrupts (the transfer of interrupts to the correct processor) when the associated
domain/workload has moved to another processor -- useful in load balancing situations.
5.4. AMD-V
AMD's version of virtualization support is entitled AMD-V [6], and offers comparable support
for CPU, memory, and I/O virtualization, and migration.
5.4.1. CPU
AMD-V incorporates a new bit of CPU state entitled ``guest mode" [7] that is analagous to nonVMX root mode in Intel VT-x. Guest mode is entered via the VMRUN instruction. Whenever
VMRUN is called for a specific VM, the hardware accesses a structure called a Virtual Machine
Control Block (VMCB) for that VM. The VMCB stores configuration information on what
events and interrupts should be intercepted by the VMM for that guest, as well as CPU state for
that VM, and bits to indicate additional special instructions for preparing the VM's execution
environment. On VMRUN, the VMCB is used to swap in the VM CPU state, and VMM state is
saved to memory for later.
AMD-V also offers similar support to Intel for interrupt virtualization [7]. First, it has a master
bit in the VMCB that activates or deactivates interrupt virtualization -- if active, then the guest
interrupt masking bit only controls virtual interrupts, and the VMM's interrupt masking bit
controls physical (external) interrupts. (If interrupts aren't virtualized, the guest controls both
physical and virtual interrupt masking.) If interrupts are virtualized, then the TPR value for each
guest is also virtualized. The VMM can choose to intercept all physical interrupts, deliver virtual
interrupts to guests, and also force a trap when a VM with interrupts masked enables them once
again. There are additionally mechanisms for the VMM to clear out the pending interrupt queue
in an arbitrary manner or disregard certain interrupt vectors when determining the highest
2009-12-08
priority pending external interrupt -- this can help in the case of a VM that is blocking other VMs
by not processing its own external interrupts.
5.4.2. Memory
Rapid Virtual Indexing, also known as Nested Paging, is AMD's version of hardware support for
virtualization memory management [5]. Like Intel's EPTs, it incorporates a second level of
hardware page tables, eliminating the need for shadow paging. It functions similarly to EPTs,
and has been shown to yield dramatic performance increases (but, likewise, potentially suffers
from the problem of slower page table walks) [91].
Address Space Identifiers (ASIDs) are used to eliminate the need for TLB flushes when
switching to a new VM [5]. An ASID is a unique ID assigned to each guest by the hypervisor,
and is used to tag TLB entries, so that TLB entries for different VMs can be distinguished. It is
similar to Intel's VPID feature, and basically updates the TLB along with the page tables to
support a two-level virtual memory scheme.
5.4.3. Migration
AMD-V Extended Migration also provides hardware support for live migration of VMs between
AMD Opteron processors in a pool of systems [8]. This support includes features to facilitate
backward compatibility (by limiting the instruction set features exposed to guests to the lowest
common denominator of all systems in the pool) and forward compatibility (by allowing VMMs
to disable instructions found on newer processors that guests expect to not be functioning). In
other words, similar to Intel's FlexMigration, it helps ensure that a guest will never find an
unexpected instruction environment, no matter where it migrates to in the pool.
5.4.4. I/O
Similar to Intel VT-d, AMD-V contains a component termed an I/O Memory Management Unit
(IOMMU) (previously DEV) that provides support for I/O virtualization [4]. It uses similar
components -- through I/O page tables, I/O memory accesses are checked for permissibility and
remapped. Through a device table, devices can be assigned to certain domains, which correspond
to a particular portion of the I/O page tables (and therefore memory regions and remappings).
And, through an interrupt mapping table, interrupts are checked for permissibility and routed to
the appropriate domains.
It is worth mentioning that, due to AMD64 systems consisting potentially of multiple processors
and device nodes that are spread out and connected with AMD ``HyperTrasport" links, an
IOMMU can only intercept I/O memory accesses if the operation goes through the IOMMU
node in the HyperTransport network -- therfore, multiple IOMMUs can be necessary to cover all
devices [4].
2009-12-08
5.5. ARM TrustZone
ARM TrustZone technology, for ARM11 and ARM Cortex embedded processors (include ARM
Cortex-A9 MPCore multicore processors), offers support for creating two securely isolated
virtual cores (or ``worlds", as they are termed) on a single real core. One world is Secure and one
world is Normal, and TrustZone manages transitions between them, preventing state or data from
leaking from the Secure world to the Normal world [13]. While overall less developed and more
limited in capabilities than Intel VT or AMD-V, and intended more for supporting security
architectures in general, it does offers some similar components to those found in x86
virtualization support packages. It is described in detail in [13], and also in [11].
First to mention is that the system bus control signals now contain one extra bit, the NS or
``Non-Secure" bit, that functions like a 33rd address bit to differentiate between the two worlds.
Each virtual core has its own address space -- through special TrustZone memory controllers
([12][15]), physical memory is statically assigned to the Secure or Normal worlds. Furthermore,
TrustZone provides a feature called the Advanced Peripheral Bus (APB) that is connected to the
main system bus by a bridge component -- this bridge component enforces security for all
peripherals on the APB, and can deny insecure or otherwise problematic transactions from being
dispatched to peripherals. Hardware devices can be assigned to the Secure or Normal world. This
enables tight control of, for example, the interrupt controller, screen and keyboard.
Both worlds have user and privileged modes, as in normal operating systems. But the Secure
world also contains a special mode called ``Monitor Mode" that is responsible for context
switching between the two worlds. The secure monitor call (SMC) instruction always traps to
Monitor mode. External interrupts and aborts can also be made to trap to Monitor mode, but
system calls, MMU memory faults, and misuse of undefined or privileged instructions can't be
configured to trap to Monitor mode. The Secure and Normal worlds and Monitor mode have
their own exception handlers. Monitor mode is responsible for swapping in and out CPU state
(i.e., registers) when switching from one world to another, enabling execution to begin where it
left off in whichever world is being switched to.
TrustZone supplies two virtual MMUs for its two worlds, enabling each one to manage its own
virtual to physical mappings for greater efficiency and isolation. Note that the Secure world can
map in pages from the Normal world, but not the other way around. Important MMU state (such
as the location of page tables) is kept independently for each world. Additionally, TLB entries
are tagged with the associated world, to prevent the need for TLB flushes in a context switch.
Cache entries are also tagged with the associated world, easily facilitating cache usage by both
worlds.
External interrupts generated for either world can be handled efficiently; if they are destined for
the currently running world, then they are delivered immediately, whereas if they are intended
2009-12-08
for the other world, execution can trap to Monitor mode and then the interrupt can be properly
routed. The Monitor typically runs in a non-interruptible state (interrupts masked).
In addition to the above-described mechanisms, one could say that security begins on TrustZone
platforms with the secure boot process initiated when the device is powered on. The hardware
bootloaders that kick off the process can utilize public key cryptography to verify the integrity of
code at each successive step in the process (creating a chain of trust), and can leverage some kind
of TPM or other tamper-resistant module. The system always boots into the Secure world first,
and the Secure world then loads the Normal world -- this prevents untrusted code in the Normal
world from making unauthorized system changes before the Secure world has properly prepared
the system.
There are numerous other features available in the TrustZone hardware ``library", including a
special DMAC capable of simultaneously handling channels for the Secure and Normal worlds.
As previously covered, taking the I/O memory traffic burden off of the processor and the highprivilege software can offer significant performance savings.
So, while TrustZone doesn't offer support for arbitrarily many virtual machines, it does support
two strongly isolated virtual cores with partitioned devices and independent memory
management facilities, as well as regulated paths for transition between the two worlds.
Potential TrustZone system designs for secure architectures, using various TrustZone-supportive
hardware components, can be broken down into different tiers, as described in [14]:
Tier One -- In this basic (and low-cost) mode, intended to support secure PIN
entry and payment protocols, the Secure world runs a Secure OS and the Normal
world runs an Open OS. The Open OS is running and controlling input
peripherals and the screen the majority of the time, but if secure entry of a PIN or
other data is required (especially in service of some type of payment transaction),
the Secure OS takes control of the input devices and the screen. The Secure OS
uses an isolated contiguous block of SRAM. It is booted with a trusted boot
process, whereby a hardware component boots a base OS, then loads the Secure
OS, which subsequently loads the Open OS.
Tier Two -- A superset of Tier One, Tier Two is intended to support DRM
applications. The Secure OS owns certain protected memory regions used for
DRM content, and if the Secure OS itself doesn't perform the decoding, then an
external chip or other peripheral can also be used (and access to this component
will be restricted to the Secure OS). Tier Two involves more complex control
capabilities over devices and I/O than Tier One, to safeguard the protected
content.
Tier Three -- Tier Three, a superset of Tier Two, is intended to offer full support
for ``cloud computing" ([14]) in which secure services run in a protected manner
2009-12-08
in the Secure OS and untrusted data is received, processed, and distributed by the
Open OS. It increases support for device control, and adds the DMA controller as
well as additional acceleration mechanisms for securely, efficiently processing
DRM on large content files.´
Tier Three can therefore also be used to support system virtualization. The hypervisor runs in the
Secure world, and a single guest runs in the Normal world. The DMA controller and device
control mechanisms support I/O virtualization, and the TrustZone isolation mechanisms and
interrupt handling support isolate the hypervisor from the guest.
2009-12-08
6. Hypervisor-based security architectures
6.1. Advantages
Virtualization serves as a powerful enabler for security services and security architectures, due to
the hypervisor's minimized TCB, the isolation enforced between hypervisors and guests, and the
hypervisor's presence in a higher hardware protection zone than the guest(s). Security services
based on a hypervisor have excellent visibility into guests, yet are still securely protected from
guests -- this overcomes the problems inherent in traditional architectures such as intrusion
detection systems, where the security service is either remotely located (with greatly reduced
visibility) or located on the monitored system itself (with greatly increased vulnerability to
attackers) [39].
Due to modern operating systems' bulk and complexity, and abundance of continually unearthed
critical security flaws, security services implemented by such OSs may not be trustworthy. In
fact, the OSs themselves may not be trustworthy. Hypervisor-based security services can be
externally applied, in some cases to totally unmodified guest OSs, and thereby bring more
trustworthy security. This can provide protection for the guest OS from its applications, for the
guest applications from each other, and even for guest applications from the guest OS.
Implementing secure services through hypervisors and virtualization also benefits from
virtualization's inherent modularity. Services can potentially be reused for different guests and on
different hardware platforms. This could facilitate, for example, a company enforcing consistent
security policies efficiently on a wide variety of systems.
6.2. Virtualization security challenges
While offering clear benefits, virtualization also creates security-related challenges that must be
considered when implementing hypervisor-based secure architectures.
Virtualization is simpler when it concerns strictly isolated virtual machines -- but what about
when VMs must cooperate? Bellovin discusses the difficulties in defining the interfaces and
interactions between VMs, and how this breaks pure isolation and introduces problems [22].
Indeed, as shall be discussed later, there are many emerging scenarios (particularly in mobile
platforms) where isolated domains must cooperate in some fashion, and in such cases some sort
of mandatory access control, information flow control, or other mechanisms must ensure the
security of the interactions and the protection of important resources in the system.
Garfinkel and Rosenblum enumerate a number of potential security problems introduced by
virtualization [40]:
2009-12-08
Scaling -- Virtualization enables rapid creation and addition of new virtual
machines. Without total automation, this dynamic growth capacity can destabilize
security management activities such as system configuration and updates,
resulting in vulnerability to security incidents.
Transience -- Whereas normal computing environments/networks tend to
converge on a stable state, with a consistent collection of machines, virtualization
environments can have machines that quickly come and go. This can foil attempts
at consistent management, and leave, for instance, VMs that come and go and are
vulnerable to and/or infected by a worm that goes undetected. Infections can
persist within such a fluctuating environment and be difficult to stamp out.
Software lifecycle -- Since a VM's state is encapsulated in the VMM software
(along with any supporting hardware), snapshots of state can easily be taken. A
VM can be instantiated from a prior snapshot, enabling easy state rollback -- this
can interfere with assumptions about the lifecycle of running software. For
example, previously applied patches or updates may be lost, or VMs that accept
one-time passwords may be made to re-accept used passwords. If rolled back state
causes the reuse of stream cipher keys or repetition of other cryptographic
mechanisms that shouldn't be reused in an identical fashion, cryptosystems may
be compromised.
Diversity -- Increased heterogeneity of operating systems and environments will
increase security management difficulties, and present a more varied attack
surface.
Mobility -- While also cited as an advantage of virtualization, mobility and
migration automatically engender more complexity and security issues. Moving a
VM across different machines automatically increases that VM's TCB to include
each one of those machines -- therefore increasing security risk, and in a dynamic
environment, potentially making it harder to track which VMs may have been
exposed to physical machine compromises. It also poses the danger of moving
VMs from an untrusted environment (such as a home machine) to a trusted
environment, and makes it easier for a malicious insider to steal a machine (since
a machine is simply a file on a disk).
Identity -- Static means of identifying machines, such as MAC addresses or
owner name, may not function with virtualization. Machine ownership and
responsibility is harder to track in a dynamic virtualized environment.
Data lifetime -- Guest OSs may have security requirements about data lifetime
that are invalidated by a VMM's logging and instruction replay mechanisms;
through external logging facilities, combined with VM mobility, it is possible that
sensitive data may be left in widely distributed persistent storage.
2009-12-08
Nichols echoes the configuration and management difficulties, and highlights other virtualization
security issues in [90]. For instance, virtual networks, whose traffic is routed internally within a
physical machine, won't be protected by all the usual physical network security mechanisms,
allowing attacks to be mounted and spread. Furthermore, attacks on VMMs yield a bigger payoff
than traditional OS platforms, since a VMM can control multiple virtual machines (and possibly
a varying collection over time), so any hypervisor vulnerability becomes extremely critical.
Nichols also mentions how security and management tools supporting virtual environments in
general are not yet mature, due to the relatively recent gains in virtualization popularity.
Measures may have to be taken to address these challenges, depending on local requirements.
Fortunately, ultra-thin, single guest, monitoring/enforcement-oriented hypervisors are not
affected by many of these concerns -- their small code size lessens likelihood of hypervisor
compromise, with a single guest and no hypervisor network presence there is no virtual network,
and they do not support the complex management features (mobility, transience) that result in
security difficulties. However, they may create some additional management complexity simply
because of the increase in individual system complexity. Also, should such a monitoring
hypervisor be made to sit beneath a traditional VMM, some of these issues may of course need to
be addressed again.
6.3. Architectural limitations
Hypervisor-based security services are not a panacea. There are limitations to what can be
accomplished.
6.3.1. The semantic gap
Hypervisor-based services, as running external to and at higher privilege than the guest OSes,
have complete access to guest memory, but do not have intimate access to guest OS services and
context. They have total visibility into the guest, and have the capacity to see all guest memory
pages, but they do not have interactivity with guest ABIs, APIs, and abstractions. To have
understanding of guest state, the hypervisor (or the service running on it) must somehow bridge
the so-called semantic gap -- the gap in understanding between the hypervisor's view and the
guest OS state. Without additional facilities to bridge this gap, the hypervisor will see guest
memory, but it will be a meaningless jumble of values. The hypervisor must be endowed with
relevant structural, contextual knowledge of the particular guest OS in question.
This is important for security because many security services must have accurate understanding
of relevant guest state to implement meaningful functionality. Without such knowledge, a service
won't know what is happening in a guest nor will it be able to make reasonable deductions,
decisions, or actions based on guest state. Such a service must have processing facilities capable
of mapping in guest pages and then interpreting the pages to divine the current relevant state
2009-12-08
from raw guest memory. Different services may require knowledge of different aspects of guest
state.
Using the hypervisor's view into its VMs coupled with contextual knowledge and processing
facilities to interpret guest OS state is known as VM introspection (VMI); introduced in the
Livewire system [39], it is an established technique, but increases the complexity of the security
services code, and furthermore introduces management issues since the knowledge base must
remain updated in parallel with any relevant updates to the monitored guest OS.
VMI could be divided into two areas -- inspection, and interpretation (or semantic
reconstruction). Inspection is the process of actually mapping the proper guest pages into
hypervisor memory. Interpretation is the process of comprehending those pages. There are VMI
frameworks in existence such as the publicly available XenAccess [66], as well as the as yet
unreleased VIX toolkit (also for Xen) [43], that attempt to provide extensible foundations and
tools for VMI. VIX, for instance, contains a set of Unix-like utilities built over an inspection
library that can be used from a Xen administrative domain to examine a running virtual machine
-- this may reveal relevant forensics data, or discrepancies between the guest OS and VMM
views due to malware such as rootkits. XenAccess provides an API for mapping and inspecting
guest pages from an observer domain, and some examples of how to use the API and interpret
guest memory. More advanced, context-specific modules for interpreting state can be built above
XenAccess.
6.3.2. Interposition granularity
For performance reasons, as many guest instructions as possible run directly on the hardware.
However, as we know, certain instructions and events must trap to and be handled by the
hypervisor so it can enforce virtualization, isolation, and so forth. The granularity of events on
which the hypervisor can interpose is limited by the hardware interface. The ability to handle
events by immediately trapping to hypervisor control is sometimes called active monitoring,
since the hypervisor and the security service can guarantee active response to supported events,
as opposed to passive monitoring, wherein guests are periodically monitored at the discretion of
the hypervisor-based monitoring service. Passive monitoring by the hypervisor can't guarantee
discovery of problems resident in unmonitored state or conditions that can hide or change
between monitoring cycles, and can't support immediate prevention or handling of events or
negative conditions as they arise.
The hypervisor can handle any event that can be made to trap to the hypervisor's high privilege
mode, possibly including privileged instructions, memory accesses, device operations,
exceptions/interrupts, or other conditions. Without special virtualization support in hardware, the
range and specification of traps may be more limited. In either case, the hardware-supported
granularity may not be sufficient for certain applications. For example, certain security
monitoring services may need to guarantee response to fine-grained guest events. This problem
2009-12-08
can be alleviated by using already discussed techniques such as paravirtualization and
previrtualization, where the guest OS is made to use a hypercall interface, or binary translation,
where appropriate instructions are translated at runtime. These techniques suffer problems
already mentioned. Another method is to dynamically introduce hook code into the guest OS, but
this code, as resident on the guest and potentially vulnerable to guest compromise, comes with its
own security problems. The Lares system [67] uses carefully placed and VMM-protected hook
code injected into the guest OS to increase the active monitoring capabilities of the VMM.
Limitations on interposition granularity and the capacity for VM introspection are critical issues
for implementing security services, and any improvement to either area will enhance the
possibilities for virtualization-based security architectures.
6.4. Architectural patterns
When designing virtualization-based security services (that is, security services that run atop a
hypervisor and operate on guest domains), there are basic architectural/design patterns that may
be followed.
6.4.1. Augmented traditional hypervisor
One method for implementing security services using traditional system virtualization
hypervisors is to implement the services in the hypervisor itself. This may be convenient for
development, especially if the code of the hypervisor is readily available and already understood.
However, it poses the major disadvantage of adding to the complexity and code size of the
hypervisor, which counters one of virtualization's fundamental strong points -- the minimized
TCB presented by the hypervisor. Therefore, it is most likely advisable to take a different
approach.
6.4.2. Security VM
With traditional hypervisors, it is quite common to implement security services in a specially
designated security VM, similar to Xen's administrative domain ``dom0". Through this approach,
the security services run in a special VM granted all the necessary privileges by the hypervisor,
presumably runnning a stripped down operating system specially crafted for the security
services. The VMM/hypervisor must be modified only to the extent that it can communicate with
the security VM and provide it with the privileges and resources it needs to implement the
security services. This approach, while probably presenting more development overhead than
developing directly in the hypervisor, preserves the hypervisor's minimal TCB, and is
furthermore more modular (enabling the security services to be more easily modified,
transferred, or recombined in other systems in the future).
6.4.3. Microkernel application
2009-12-08
In the case of a microkernel serving as a hypervisor, security services can be implemented in a
specially written microkernel application, which will run in its own protected address space. It
can connect to the VM provisioning layers using the microkernel's IPC services. The application
will have to run with sufficient privileges to implement the desired security services.
6.4.4. Thin hypervisor
Lastly, thin, single-guest hypervisors can be used to provide an ultra-low footprint monitoring
and enforcement layer between hardware and OS software for implementing security services.
The extremely small code size can lead to easier verification and hopefully therefore stronger
security and correctness. It is important to consider what types of services can be implemented
on which hardware platforms, and still maintain the ultra-low footprint.
6.5. Isolation-based services
We can now briefly examine some of the potential security services provided by virtualizationbased architectures, of which there are many. They can be loosely divided into two categories -monitoring- and isolation-based services. Monitoring-based services focus on observing,
interpreting, and possibly responding to VM state, and may make heavy use of VM
introspection. Isolation-based services, on the other hand, leverage the hypervisor's high
privilege and interposition capability to isolate and protect system components and enforce
system security. Note that this distinction is not precise, and other categorizations are possible.
We will describe some isolation-based services first.
6.5.1. Isolation architectures
2009-12-08
Figure 3: Domain isolation in a mobile device
While this section focuses on isolation-based security services, it is also worth discussing the
interesting possibilities for isolation architectures engendered by virtualization. Although
isolation of hosted domains is a given security advantage in virtualization, it bears deeper
investigation in specific contexts. For example, in a system that contains important components
and trusted and untrusted software, virtualization can be used to create a safer environment for
the trusted and critical components. Envision a mobile system containing trusted cellular
hardware (including the SIM card and cellular radio, which must be safe from compromise), a
trusted software stack that controls authentication and the critical hardware, an untrusted
software stack running user applications and accessing wireless networks (such as cellular,
802.11 or Bluetooth), a trusted hardware and software component for decoding and protecting
DRM content, and possibly other components that must be protected (such as a module for
storing private user information). While these components may have been initially contained in a
single domain/OS, hence each vulnerable to any compromise of the other, virtualization can
support such a scenario by isolating each component in its own domain [25] (see Figure 3). The
hypervisor-enforced isolation protects each domain from the compromise of other components
(and potentially protecting each component from even a compromise of itself). The hypervisor
must provide secure communication facilities between domains, and possibly limit the
communication to only what is needed to support functional requirements. To illustrate the
advantages with an example from [32]-- if the device's Bluetooth implementation is
compromised (Bluetooth has been known to have security vulnerabilities [74]), user applications
may be vulnerable, but system authentication and the cellular radio will remain unharmed.
Therefore, virtualization can be used to partition a system into various isolated yet cooperative
domains and thereby increase security for the system as a whole, also reducing the TCB for the
most important components.
6.5.2. Kernel code integrity
There are multiple research systems supporting kernel code integrity.
First off, we have SecVisor [77], an ultra-thin hypervisor (only 1100 lines of code in the
presence of Intel-VT or AMD-V) supporting a single guest, ensuring that only user-approved
code is ever executed in kernel mode and that kernel code is not modified (except by SecVisor).
The system uses IOMMU support to prevent DMA writes to kernel code, page tables for
memory protection, and MMU support to virtualize guest OS memory to protect the page tables.
All hardware locations where kernel entry points are specified (such as the interrupt vector table)
are virtualized, so that SecVisor can always verify that kernel entries will go to a valid kernel
code location. When in kernel mode, user mode pages are marked non-executable, and vice
versa. This forces a trap whenever transitioning between modes, enabling SecVisor to switch the
2009-12-08
set of pages marked non-executable. This trap also enables SecVisor to enforce, in the case of
transitioning from kernel code to user code, that the CPU switches to user mode. (Therefore, a
buffer overflow in the kernel can't be made to execute shellcode in a malicious user process.)
When marking pages executable, SecVisor also marks them read-only, so that code that can be
executed can't be modified. Furthermore, when entering kernel mode, SecVisor only marks as
executable those pages that are approved by the kernel code policy, so that execution of nonapproved code will trap.
A VMM-based kernel protection system, found in [104] and dubbed UCON KI for usage control
framework for kernel integrity, offers more flexibility. This system provides an access control
model based on subjects, objects, attributes, rights, and events. Subjects include processes and
loadable kernel modules, objects include kernel memory spaces and registers, attributes describe
subjects or objects, rights are actions on objects permissible by subjects, and events are key
points at which policy can be enforced by the system. Virtual machine introspection techniques
are used to determine subject attributes. Policy includes predicates describing whether rights are
to be granted or denied depending on events, subjects, objects, and attributes. One interesting
feature of the system is that rights and attributes are dynamic and mutable with continuity -meaning that if an event happens which changes a subject's attributes, its currently granted
access rights may be revoked. There may be cascading rights evaluations from a single event.
The authors successfully used the system to summarily defeat a large collection of rootkits
attempting to modify the kernel. The flexibility of the system indicates it could be adapted and
expanded for further uses. In tests it was run on the Bochs emulator, but could be used with other
virtualization layers as well.
6.5.3. Memory multi-shadowing
The Overshadow system [28] runs in a VMM and protects applications on a guest OS from each
other and from the guest OS itself by using multiple views of guest application memory. To the
application, the real view of memory is presented. To other processes (including the OS), an
encrypted and integrity-protected view of the memory is presented. The crucial component in
this system is a protected shim that is inserted into protected applications at load time -- this shim
is needed to identify and maintain the context of each protected application, and is also used by
the Overshadow system to handle complicated operations such as marshalling system call
arguments and return values to enable safe transition of data across the application-OS protection
boundary. The shim uses a hypercall interface to communicate directly with the VMM.
Overshadow uses multiple page tables for an application (one with cleartext pages for the
application's own use, one with ciphertext for the use of the rest of the system), and any
protected page will only be present in one page table at any given time. Pages can be swapped to
disk in encrypted state, and encrypted data can be moved around by untrusted components. In
one limited sense, Overshadow is able to remove the OS from the application's TCB, in that the
OS can no longer inspect or tamper with application memory pages. The Overshadow system
2009-12-08
was built on a VMWare binary translation VMM, but it is pointed out that a smaller, higher
assurance VMM could have been used as well.
Another system dubbed Software-Privacy Preserving Platform (SP3) [105] (published at the
same time as Overshadow, in March 2008) also protects data secrecy for user applications,
including memory pages and even registers (the latter during context switches). However, unlike
Overshadow, SP3 instead relies on extensions to the page table and emulation of a modified x86
interface in the Xen hypervisor, and requires modification of guest code to utilize new virtual
instructions that prompt the hypervisor to invoke operations for creating and managing
protection domains. Fortunately, at least in the case of Linux, significant modifications to the
guest OS were not required. Protection domains can consist of one or more guest processes, and
memory for a domain is encrypted with a domain-specific set of keys. Furthermore, the
hypervisor maintains a cache of decrypted pages, to speed up memory accesses in cases when the
page has been already encrypted. So, while some features of this system are more developed than
Overshadow, it does require modification of guest code, which Overshadow managed to avoid.
6.5.4. Protecting against a malicious OS
In a follow up [70] to Overshadow, it is pointed out that many virtualization-based security
architectures have focused on isolating applications and domains, protecting memory, and other
such services, but have not addressed ``OS semantics". The authors highlight that in spite of
Overshadow's memory protection, it won't safeguard an application against its own
vulnerabilities, nor can it prevent a compromised and malicious OS from posing a serious threat.
For example, a malicious OS could grant multiple mutexes simultaneously, or simply refuse to
schedule a process, or carry out other nevarious activities that render applications useless.
Therefore, the authors suggest and motivate more developed system components that expand
Overshadow's model and take more aspects of security-critical functionality out of the hands of
the OS, protecting applications at the level of OS semantics.
6.5.5. I/O Security
BitVisor [78] is a thin hypervisor system that provides I/O security for a single guest OS. It relies
on modern virtualization hardware support (Intel VT or AMD-V). For example, it uses IOMMU
functionality to protect against DMA attacks, and I/O instruction trapping bitmaps to configure
which devices' instructions will trap to the hypervisor. It implements its services via what it
terms parapass-through drivers -- drivers that can be substantially smaller than usual device
drivers, since they only need to handle a small subset of normal driver functionality, namely the
control and data instructions. Handling the control instructions enables BitVisor to observe
device state, and handling the data instructions enables it to perform security operations on the
data. Such a parapass-through driver resides in the hypervisor layer. Most I/O instructions pass
through the driver directly to the hardware, but the control and data instructions are specially
2009-12-08
handled. A test system was implemented using an ATA parapass-through driver to perform
encryption of stored data, a service that could be provided regardless of the guest OS.
6.5.6. Componentization
The Nizza system [47] is based on a L4-microkernel variant and provides a way to decompose an
operating system and its applications into critical/secure and non-critical components, reducing
the TCB for applications and even removing the OS from the TCB. Security critical components,
such as for sealed storage, cryptography, and ensuring application isolation within a GUI, are run
as microkernel applications. These components are loaded by an additional ``loader" microkernel
application. The guest OS may be paravirtualized to run on the microkernel, or may run above a
VM provisioning layer. Applications and the guest OS then rely on the isolated, minimized
microkernel components for secure functionality. They connect to these services using the IPC
system call interface exposed by the microkernel.
The Nizza system does require potentially extensive modification to guest software, but presents
a compelling method of drastically reducing application TCB. In comparison to the Nizza
system, Overshadow attempts a similar (albeit lesser) goal without requiring guest modification,
which leads to more performance and implementation challenges on the VMM side.
6.5.7. Mandatory Access Control (MAC)
MAC policies such as Bell LaPadula, the Biba integrity model, and the Chinese Wall model [10]
can offer stronger security for critical applications. With hypervisor-based MAC, the benefits of
MAC can be brought to existing systems and architectures, and enable greater security for virtual
domains. The sHype system [73] brings MAC to the Xen hypervisor. Its granularity operates at
the level of VMs and the shared VM resources (event channels and shared memory) used by Xen
guest device drivers, enabling the mentioned MAC policies and others to be applied to domains
and their interactions. This can facilitate a secure VM coalition as earlier described, where
domains cooperate securely to achieve the system goals.
The Xen Security Modules project [31] is still developing, and attempts to modularize the
application of MAC and other services for Xen. It provides a common framework whereby
different security services and models can be used depending on the situation. For instance, it
supports both sHype and Flask [82] modules.
6.5.8. Instruction set virtualization
The Secure Virtual Architecture (SVA) system [33] presents an interesting design where a
hypervisor layer exports a type safe instruction set interface for carrying out all the activities in
the system. The interface is divided into SVA-Core (which includes all instructions for typical
computation, including logic, arithmetic, memory allcation, function calls, branching, and other
instructions) and SVA-OS (consisting of privileged OS-only operations such as I/O and MMU
2009-12-08
configuration that are typically implemented in assembly). All virtual machines must use this
interface. Operating systems that run in a virtual machine will have to be ported in three steps:
1. Port the platform-dependent portions of the kernel, including all assembly code, to
use the SVA interface. The authors argue that this is acceptable as a typical step for
porting an OS, and furthermore may be easier for SVA, since SVA's interface is
higher level and more abstract than typical ISAs.
2. Make certain documented, specific changes to kernel memory allocators.
3. Optional modifications to the kernel to improve SVA performance.
Applications, on the other hand, typically need only be recompiled to take advantage of the
secure SVA-Core interface.
The system uses a ``safety checking compiler" to compile guest code to produce SVA bytecode,
whose safety properties are then checked at load time by a Java-reminiscent ``bytecode verifier".
This process can occur offline, combined with digital signatures to authenticate the verification.
A runtime translator converts the bytecode into native machine instructions. Since SVA can
manage all critical system operations via its type safe interface, it can provide security
guarantees for the guest systems (even though the guest kernel is probably written in C),
including control flow integrity, type safety for certain types of objects, array bounds safety, no
dereferences of uninitialized pointers, and no double frees, among others.
In a sense, SVA is like ``Java for operating systems" in that safety guarantees are enforced and
software isolated by a virtualization layer -- but it is quite interesting to consider how this system
facilitates bringing such guarantees to legacy systems implemented in unsafe languages with
arguably reasonable porting cost. It in effect creates a new interface layer between the ISA and
the ABI.
6.6. Monitoring-based services
Now we shall discuss some monitoring-based services presented in research. Monitoring services
also leverage the hypervisor's high privilege, but focus more on observing, interpreting, and
possibly responding to guest state. Monitoring services may operate at a higher level of
abstraction than isolation services, and require knowledge and interpretation of higher level guest
OS abstractions.
6.6.1. Attestation
Hypervisors, in their high-privilege position, can be used to attest to guest code integrity and
state. This, of course, aligns with the Trusted Computing Group (TCG) and their architectures
for remote attestation. An emerging potential area for attestation is on mobile devices; the TCG
has relased a mobile platform specification [87], and virtualization may possibly be used to
fulfill this specification. While SELinux has already been used to do so [1][106], to our
2009-12-08
knowledge virtualization has not. Discussion on utilizing ARM TrustZone technology to
facilitate Trusted Computing is found in [102].
An early VMM-based system for attestation was Terra [38]. Terra, using a ``Trusted VMM"
coupled with a management VM, supports open and closed box domains, sealed storage, and
remote code attestation for domains. If a domain is designated as closed-box, Terra gives it
stronger isolation -- in addition to standard memory isolation, it will provide privacy and
integrity protection for stored data, thus sealing it off from observers. Closed-box domains can't
even be examined by the system owner. A closed box domain can approximate a proprietary
closed box system such as a hardware component or custom embedded system. Suggested
examples of such systems are game consoles, ATMs and mobile phones. Terra was implemented
using VMWare GSX Server, with a management VM that is charged with allocating resources
(memory, disk, devices) as well as setting up connections between VMs. It is remarked that, as
with Overshadow, a higher assurance VMM could be used in production environments. Due to
Terra's support for closed-box domains and sealed storage, it could be argued that it also
provides isolation-based services, but it was placed in the monitoring section due to its
attestation and trusted computing emphasis.
6.6.2. Malware analysis
Numerous virtualization-based systems for malware analysis have been presented. Two
examples will be discussed.
Firstly, the Patagonix system [59] is interesting because it attempts to dispense with the semantic
gap in a unique way. It tracks code execution by using generic hardware mechanisms that remain
consistent independent of any OS differences. By setting the non-executable (NX) bit on all
pages, any code execution traps to the hypervisor, whereupon the page can be inspected. (Code
need only be inspected when it first runs, or after it is modified.) Hardware-stored data such as
addresses of page tables themselves is used to differentiate between execution contexts. The
system uses a database of known good binaries (including Windows and Linux kernel binaries)
to check the identity of executing code. This database is the only aspect of the system that is OSdependent, and since it is decoupled from the implementation of the system (and arguably much
easier to acquire system binaries than to implement system-dependent logic), the system's
generic convenience is maintained. The results of the identity checking are sent to the user, who
can compare Patagonix's report on currently executing code with the report issued by the OS
itself, and thereby detect covert executions like rootkits. The system successfully detected all
rootkits tested on it. So long as a sufficient database of known-good binaries for the guest in
question is available, the system can support any guest.
Another system described here [51] offers broader malware detection support, but is more
heavily dependent on VM introspection. It uses VM introspection and semantic reconstruction to
capture the relevant state of an observed system (files, processes, etc.). This state can be
2009-12-08
compared with the state reported by the operating system to detect discrepancies. The semantic
reconstruction facilities also enable the system to run existing malware detection utilities
externally on a VM, potentially even facilitating the use of utilities written for one platform to
scan a different platform. The system supports multiple VMMs, including Xen, VMware,
UserMode Linux and QEMU.
6.6.3. Intrusion detection
Another natural virtualization monitoring service is intrusion detection. The previously
introduced Livewire system [39] was the seminal use of VM introspection. It consists of a
management VM, running both a policy engine and a semantic reconstruction component that
used standard crash dump utilities on guest pages to analyze system state. A later system,
Introvirt [52], supports an interesting feature whereby exploit-specific predicates (possibly
written by a software patch author) can be used to provide perfect detection of the occurrence of
the exploit. To bridge the semantic gap between predicates and guest software, and enable
predicates to be highly expressive, the system can execute existing guest code (such as system
calls or application functions) in the guest address space. To prevent modification to guest state
as a result of executing the guest code, the system supports rollback functionality.
6.6.4. Forensics
Virtualization-based forensics services enable new possibilities for live forensics analysis. While
offline analysis can accommodate many forensics applications, volatile and dynamic system state
can only be obtained via live analysis of a running system under attack. Traditionally however,
live analysis presents difficulties since the presence of the forensics investigator might be easily
discerned by an attacker, and other aspects of system state may be affected by the investigator's
presence. In the previously cited system using the VIX toolkit [43], safe live analysis is enabled
via virtual machine isolation and introspection. The system runs in a Xen administrative domain,
and data is therefore gathered externally to the monitored user VM. While the authors hope the
system is undetectable, they acknowledge that using timing/performance analysis or other similar
circumstantial techniques an attacker may be able to conclude that the system is being monitored.
It has been suggested that running such a forensics system on its own core in a multicore system
might lessen the potential for timing analysis, but it may still be necessary to ``freeze" the
monitored system in certain moments to gather state information.
6.6.5. Execution logging and replay
Another apt and canonical use of virtualization's monitoring possibilities is to log and replay VM
execution. The ReVirt system [35] enables complete logging and replay of VM execution, and
since it is VMM-based, the logging will persist in periods before, during, and after guest attacks.
Then, if an attack is discovered, the incident can be replayed in exactitude to ascertain its source,
cause, effects, and so on. It can also be used to generally audit system activities. ReVirt can
2009-12-08
naturally enhance or be combined with intrusion detection, malware analysis and forensics
services.
To reconstruct execution completely, instruction by instruction, ReVirt must log all nondeterministic events and data, and it does so with reasonable performance. Non-deterministic
events that must be logged include device input and system interrupts -- fortunately, such events
can be handled by the VMM.
SMP-ReVirt [36] brings the same complete logging and replay functionality to multiprocessor
systems, and must deal with such challenges as shared memory (since the order of operations on
such memory by different cores must be preserved), which can introduce significant performance
overhead over single-processor ReVirt.
6.7. Alternatives
What other alternatives are out there for implementing security services in a way that is isolated
from yet with high-privilege visibility into the monitored system? We have already mentioned
Flask/SELinux as possible alternatives ([1][106]), although we also saw with Xen Security
Modules [31] that Flask may complement rather than supplant virtualization.
Another possibility is enforcing security via FPGAs. [26] proposes a solution where a FPGA is
used to enforce a configurable security policy in a high-performance hardware-based manner.
Other dedicated hardware security modules may be able to offer specific high-assurance security
services, such as storage or I/O encryption modules (as in the venerable BLACKER [97]),
tamper proof smart cards for a variety of cryptography and authentication applications, or
Trusted Platform Modules (TPMs) for sealed storage, attestation, and other uses. Of course, any
of these hardware solutions could also be combined with virtualization.
2009-12-08
7. Multicore systems
7.1. Why multicore?
An excellent overview of multicore hardware today, including hardware and software concerns
and challenges, is found here [23]. Additionally, this article [68] also discusses contemporary
multicore developments, and furthermore merges the discussion with description of
virtualization concerns and opportunities on multicore.
As noted in [23] [68] and elsewhere, the advent of ubiquitous multicore is due to the megahertz
plateau in CPU development. Heat and power consumption curves increase beyond tractable
levels when CPU clock speeds are pushed beyond their current leveling-off capabilities. New
methods were needed to increase performance, among them the following (as described in [23]
[68]:
Increase the L2 cache size. The benefits of this strategy can only be as great as the
losses due to L2 cache misses, which vary from context to context. Note that
increasing L1 cache size is not recommended, since making the L1 cache too
large would have a negative impact on clock frequency.
Exploit instruction level parallelism (ILP) by having the CPU execute
parallelizable instructions simultaneously. The benefits of this technique are
limited by the inherenet parallelism in the instruction set and the executing
program, and it must be balanced with the resultant complexity in hardware
needed to detect and exploit ILP.
Increase use of pipelining, wherein multiple instructions are piped through the
different stages of an execution cycle one after the other, so that overall
throughput is increased. Multiple instructions can be active in different stages of
processing, instead of the processor having to complete the execution of an
instruction before starting a new one. However, this approach can increase
processor complexity, as well as increase the time for a single instruction to be
processed.
Simultaneous multithreading (SMT), also called Hyperthreading on Intel
platforms, where a single core with multiple functional units can execute multiple
threads simultaneously.
Multicore CPUs, where multiple cores are located on a single chip.
All these techniques have of course been used.
In particular, also noted by [23] [68] and elsewhere, multicore CPUs create challenges for both
software and hardware. The dominant Von Neumann hardware architecture, with a uniform
memory space accompanied by input, output, and a sequential processing unit, lends itself to
single processor systems. The creators of the Barrelfish multicore operating system agree that OS
2009-12-08
designers, in spite of the considerable differences between multicore and single core hardware,
still think of systems in a Von Neumann way (in part due to the continuance of laborious cache
coherence mechanisms) -- continuing to see a system with a uniform computation and memory
architecture [21]. There are many new hardware-related questions that must be addressed in
order to create efficient and suitable multicore systems. In addition, common software
development models, as an outgrowth of the sequential Von Neumann instruction architecture,
are not well suited to parallel programming. Software developers in general do not have the tools
or knowledge to leverage parallelism in most types of software, presenting a formidable obstacle
to the fruitful use of multicore hardware. The following subsections will discuss these issues.
7.2. Hardware considerations
In some situations, multicore hardware might seem to be a simple extension of single core. For
example, in a basic dual-core situation, the two cores might have private L1 caches, but share the
L2 cache and the communication interfaces. The rest of the system might be the same. However,
as system complexity increases, such as in a many-core hardware platform like Tilera's
TilePro64 [86], a broad spectrum of issues come to light. A range of hardware concerns in
multicore systems is illustrated in [23] (Section 2.1), and summarized in the following
subsections.
7.2.1. Core count and complexity
The number of cores that a system should have is directly related to the parallelism in the
expected workload. If performance gain for adding cores is not linear, it is most likely better to
focus on increasing the performance capacity of each of a few cores. If on the other hand
performance gain for adding cores is expected to be linear, then more cores are most welcome.
However, here an interesting phenomenon takes hold, where the spatial area of the chip must be
considered -- performance gains resultant from adding any complexity to the chip must be
proportional to the increase in chip spatial area (that is, the gain should be at least as substantial
as the area increase), or else the better path is to simply add additional chips. This concern is
only the first way in which we will see that physical size and layout affect multicore systems.
7.2.2. Core heterogeneity
Cores in a multicore system may be homogeneous (identical) or heterogeneous by design. There
may also be a distinction where cores implement the same instruction set, but have differing
assemblies of functional units or other components. For generic, non-specialized workloads,
fully homogeneous cores (as found in Intel or Tilera processors) are advisable. However, in
specialized cases where the workload is expected to have characteristics appropriate to multiple
architectures, heterogeneity may be beneficial. The Cell processor is an example in which some
cores use different instruction sets than others. A common pattern in large heterogeneous core
systems is to have a small number of high performance cores that execute generic, non-
2009-12-08
parallelizable workloads, and a large number of small cores usable for highly parallel workloads.
It must be noted that core heterogeneity can greatly complicate software development, and taking
full advantage of the available core palette can be challenging.
Core heterogeneity in general is more common in embedded systems than desktop systems.
7.2.3. Memory hierarchy
Memory hierarchy becomes considerably more complicated in a multicore scenario. Cores may
have internal memory for their own use, and they typically still have a private L1 cache. But
should cores share an L2 cache, or have private L2 caches as well? Should they share an L3
cache? How many cores should share each cache? Shared caches can result in better utilization
of hardware, and may create performance gains in situations where cores are sharing loads or in
other such circumstances, but sharing requires more costly external (off-core) communication,
and may hurt performance in other scenarios. It also decreases inherent isolation between cores
(which may be a security or reliability concern). Additionally, the less cache sharing, the more
complex the coherency maintenance mechanisms must be -- if each core has a fully private,
multi-megabyte L2 cache, and the system has many cores, maintaining coherency can be
daunting. On the other hand, sharing a cache between too many cores also becomes complicated
and costly. The problems will only increase as the number of cores increases.
The Tile64 is an example of a multicore CPU where each CPU in an eight by eight mesh has its
own L2 cache, and the chip even supports an additional ``dynamic distributed cache" (DDC)
comprising the caches of a core's neighbors [86].
7.2.4. Interconnects (core communication)
Cores in a multicore system need to communicate with each other. A primary reason is to
support cache coherency. We will not go in depth into the various possibilities for core
interconnection (such as crossbars, rings, meshes, and hierarchies) here, but as with other
aspects, the challenges of core interconnection increase with the number of cores, and physical
layout of the cores can become an important consideration.
7.2.5. Extended instruction sets¨’
The x86 instruction set is firmly in place, and isn't going anywhere [68]; hence, though it may
not have been intended to support multicore from the beginning, it is necessary to use expanded
instructions that can support multicore. In general, if an ISA must continue to be used on
multicore hardware, it may be necessary to upgrade it with special instructions to support
multicore operations, especially specific instructions relevant to implementing
shared/transactional memory [23] (Section 2.3.1) or low-latency message passing. For instance,
memory shuffling instructions that can atomically read a location value and set a new value
based on a test predicate can be useful for synchronization.
2009-12-08
7.2.6. Other concerns
Other issues, including how the main system memory will be laid out and interface with the
cores, and maintain sufficient bandwidth to the cores, as well as how many simultaneous threads
to support on each core, are other important concerns with their own tradeoffs. For instance,
supporting more simultaneous threads on a core can increase the number of cache misses (since
multiple threads compete for the cache), but can overall increase performance and utilization
since the core's processing components will be used by other threads when a thread must wait for
a cache miss to be filled. Regarding memory interfaces, in some cases with large numbers of
cores, it may even become beneficial to forego the traditional strategy of having external
interfaces along the periphery of the chip and instead stack chips in a 3D manner [60].
7.3. Software considerations
Some would say that software is at the heart of the multicore problem, since all the advanced
hardware in the world isn't going to help if software isn't written to utilize multicore capabilities.
Software concerns in multicore systems are discussed in [23] (Section 2.2), and summarized in
the followikng subsections.
7.3.1. Programming models
The dominant imperative programming model, where instruction after instruction, function after
function are executed in sequence without easy support for concurrent programming and
synchronization and safe sharing of data, must be evolved to suport multicore. But concurrency
and synchronization are not simple tasks -- for instance, concurrency vulnerabilities have been
discovered in system call wrappers (system call interposition layers/reference monitors intended
to support security) due to improper synchronization between the wrappers and the system calls,
among other causes [96].
A general strategy for how to handle interprocess (inter-core) cooperation and concurrent
programming must be settled on. The fundamental mechanism can be something along the lines
of shared memory, where cores synchronize and share access to regions of memory, or message
passing. Message passing may be more useful in situations where cores are more widely
distributed and do not have easy access to shared physical memory. If software such as the OS
kernel is to run on multiple cores, special care must be taken when synchronizing its data.
Firstly, though, one must note that programming may indeed proceed using the standard
sequential model, should a compiler be available that can automatically extract parallelism.
However, the parallelism to be found in common programs may be quite minimal, not to
mention difficult for a compiler to discover and articulate. Therefore, it is most likely needed to
proceed with other approaches.
2009-12-08
There are many programming models available to support concurrency and parallelism. The
dominant model is kernel threads (such as pthreads on Unix). Kernel threads are supported by
the OS, and hence are expensive to create and destroy. They may need to synchronize with other
threads using mutexes or other synchronization primitives or strategies. Kernel threads are a low
level primitive, and thus suitable for expert implmentation -- including implementation of
additional higher-level programming models.
User-level threads, as opposed to kernel threads, are created and managed by user-level
processes. This can make them less expensive than kernel threads. However, they are far less
common than kernel threads.
In the Single-Program, Multiple Data (SPMD) model, the program is meant to be run identically
in multiple threads on multiple collections of data. This model could be seen as a master with
worker threads, where the master sends data to a force of identical workers who operate on the
data in parallel. It may be that the parallel workers collectively contribute to a greater result,
requiring concurrent operation. OpenMP is a programming language extension that was
originally implemented to support this model [34].
The task programming model is slightly different, in that a task is an independent unit of work
that may be executed in parallel, but doesn't have to be. Cilk is a task-oriented extension to the C
programming language [71].
Domain-specific languages, as opposed to generic languages like C, C++, and Java, may provide
a deft approach to extracting parallelism from a workload, in that the specific parallelizable
qualities of the workload can be brought out and facilitated by the language.
Although there are clearly many alternatives for paralell programming, most development uses
kernel threads (and that only minimally), and the overall mentality of most software development
is, understandbly, grounded in the sequential model.
7.3.2. Programming tools
Programming tools, including languages and debugging support, must meet the challenge of
multicore, multithreaded development.
Debugging of course becomes instantly more complex if there are multiple execution contexts in
a program. Concurrent programming gives rise to non-determinism as well as new error classes
such as deadlock. Debuggers meant for single-threaded development may be insufficient to deal
with such complexity, and programmers used to single-threaded development may not know
how to debug multithreaded programs.
Programming languages need to evolve to support multithreaded, multicore-friendly
development. As mentioned, extensions such as OpenMP and Cilk provide high-level
mechanisms for leveraging multicore parallelization. A difficulty here is that fundamental
2009-12-08
change of programming languages and models takes significant time, and multicore hardware,
unfortunately, is being introduced into a legacy world filled with single-threaded code and
mentality.
7.3.3. Locality
The easy accessibility of memory to cores, whether from caches or system memory, is essential
for performance. The more that cores can be made to reference locally accessible memory, the
higher performance that can be attained. Different strategies can increase locality within a
specific core, or within an entire chip, with different tradeoffs. For example, if memory locality
can be increased within a chip, this might result in more communication between cores within
the chip as they share their caches, but less communication off the chip, the latter type of
communication being more expensive. Multicore systems in the future seem to be heading
towards more Non-Uniform Memory Architecture (NUMA)-like architectures, where physical
memory is more closely associated with individual multicore CPUs, in an effort to enhance
locality [68].
7.3.4. Load-balancing and scheduling
Scheduling of threads (including when and how often they are scheduled and how they are
distributed among cores) is clearly an important challenge in multicore. Different scheduling
policies can greatly influence system performance and properties, including (of course) locality.
Scheduling must also be considered in higher level models like tasks, where threads are but an
underlying entity.
7.4. Interesting multicore architectures
7.4.1. The Barrelfish multikernel
The Barrelfish operating system [76][20][21] is intended to deal with both increasing system
heterogeneity and the distributed nature of multicore hardware. The authors argue that OSs are
still being developed as if they are to be run on uniform CPU and memory architectures, but they
need to be rewritten to function well on, take advantage of and scale on new multicore hardware.
Additionally, with continual rapid, dynamic shifts in hardware technologies, increasing core
counts, and massive amounts of variety present in cores, devices, memory hierarchies, core
interconnects, and other hardware aspects, it is difficult for designers to optimize for certain
system configurations. Greater flexibility and management of diversity is required. To achieve
this, Barrelfish acknowledges modern computer systems as networked enviornments in their own
right and attempts to integrate distributed systems lessons in supporting dynamic, diverse,
adaptable, scalable systems. Introducing the concept of a multikernel [20], Barrelfish treats cores
as indepedent, isolated, distributed entities, capable of running independent software stacks and
communicating with each other via message passing/IPC. It is capable of managing
2009-12-08
heterogeneous cores. The authors argue that handling shared state via message passing is less
expensive than using shared memory, and that by making the OS implementation as independent
as possible from the specifics of hardware implementation, it can remain easily adaptable and
scalable to new architectures. (Only the message passing mechanisms and the device- and CPUspecific interfaces are tailored to specific hardware.)
In a multikernel model, each core is intended to be a truly independent entity. In the Barrelfish
multikernel, each core has its own independent CPU driver running in privileged mode. CPU
drivers share no state with each other. This minimal driver is non-preemptible, and processes
traps and interrupts in serial. It does not perform inter-core communication. A user-level monitor
process also runs on each core, and is responsible for communicating with other cores and the
system and maintaining its own copies of any global system state. Processes in Barrelfish are
unconventionally implemented as a collection of ``dispatcher" objects. A process has dispatchers
situated on each core upon which it might execute. The CPU drivers schedule the dispatchers,
and then a dispatcher runs its own user-level thread scheduling on its own core.
7.4.2. Configurable isolation
With cores sharing components such as caches, core interconnects, and external communication
interfaces, multicore hardware presents the potential for isolation problems. This may result in
security issues, where state leaks between execution contexts. It may also result in reliability
issues, since a failure in one core (or its components) may cascade into a failure in another core.
Furthermore, as hardware feature size and the space between components decreases, the
likelihood for hardware failure increases [9][98], meaning that in today's chips there is more risk
for such hardware failures. Therefore, in [9], the authors propose a configurable isolation model
where cores can be configured as fully isolated from each other and not sharing any unnecessary
components in critical scenarios, or can be allowed to share resources in the usual way in
common scenarios. The authors point out the need for fault isolation, detection, and repair, and
argue that such a configurable isolation system would support scenarios where either speed or
reliability are important concerns.
7.4.3. Mixed-Mode Multicore (MMM) reliability
For the same reasons, the authors in [98] provide a system for making flexible use of DualModular Redundancy (DMR), in which a process is run simultaenously on multiple cores in
order to achieve greater robustness. The contribution of the research is MMM. On traditional
DMR systems, everything runs in DMR mode, which can be expensive. Under MMM, only
critical processes are run in DMR mode, while non-critical processes can execute normally. As
with the configurable isolation proposal, this system enables users to take advantage of
robustness or performance, depending on the needs of the situation.
7.5. Multicore and virtualization
2009-12-08
As mentioned, a discussion of multicore and virtualization can be found in [68]. The article
highlights how virtualization provides a promising path for scalablitiy -- indeed, if software
cannot generally be adapted to parallel models, then at least utilization of multicore hardware can
be achieved by housing multiple independent systems on it. It also emphasizes that virtualization
is good for locality -- if a virtual machine is assigned to a single chip or core, then its locality to
that chip or core will increase. Finally, it mentions that I/O requirements will be more complex
and critical in a multicore virtualization scenario, requiring channel and device assignments for
cores and VMs. Fortunately, I/O hardware support (Intel VT-d and AMD-IOMMU) seems to be
rising to the challenge.
Overall, virtualization seems like an apt way to leverage multicore hardware. Multiple VMs can
be hosted on a system, and users may benefit from adopting a new architectural perspective,
where they divide their system into categorized and trusted/untrusted domains (for example, one
domain for financial applications, one for games, one for office work, etc.). It has already been
mentioned how virtualization can abstract hardware differences, and thus facilitiate smoother
transitions between platforms as hardware evolves. It appears a big win, so to speak.
However, before jumping ahead, we must consider some important issues. First, how does this
affect the security of the system? With more complex hardware and software, is it more likely
that the system will host critical vulnerabilities? Will the potential isolation and reliability
difficulties in multicore engender security liabilities? Furthermore, hypervisors must be
explicitly designed (and made more complex) to support multicore. These changes may make the
hypervisor -- the system TCB -- harder to verify. For example, the seL4 microkernel, as we saw,
is formally verified, but only for a single core environment! What kind of multicore support does
a particular hypervisor offer? How good is it at promoting fair and efficient scheduling, and
locality? One can't sidestep these issues when looking to multicore virtualization for answers.
7.5.1. Multicore virtualization architectures
There are a number of architectures in research that are specifically intended to enhance system
potential via multicore and virtualization. In this section we wil discuss two of them.
7.5.1.1.
Managing dynamic heterogeneity
For example, the authors of [99] (who were also behind the MMM system above) propose a
system addressing an interesting problem. They point out that even if a multicore system has
physically homogeneous cores, those cores can exhibit widely varying runtime characteristics,
making them in effect heterogeneous. These characteristics can include thermal state, other
hardware strain, cache and TLB contents, and potentially other aspects, and altogether this
runtime heterogeneity can have a sizble impact on performance. The proposed system is a thin
hypervisor meant to run directly on the hardware and abstract and manage this multicore
dynamic heterogeneity, and thereby increase overall system performance. The hypervisor can
2009-12-08
support different nummbers of virtual cores than there are real cores, and can be run below a
guest OS or a traditional hypervisor and manage the heterogeneity for virtual machines. The
virtualization layer can also support MMM.
7.5.1.2.
Sidecore
Another interesting architecture is Sidecore [56], whose authors make the observation that VM
entries and exits are expensive even with hardware support, and offer a solution to this problem
that leverages multicore hardware. They offer a system whereby the VMM functionality is
partitioned and partially assigned to specific cores in a multicore system. Then, those cores
(termed sidecores) will always run in VMM mode, thereby removing the need for VM entries
and exits for those cores. Using sidecalls for certain tasks, guest VMs or system devices can
communicate with the sidecores, rather than perform costly VM entries and exits to enter VMM
mode themselves. The paper includes experimental results highlighting the performance
advantages of implementing an operation via sidecalls instead of typical VM entries and exits.
The authors also cite many other supporting influences that facilitate or justify this sort of
architecture. For instance, it is reasonably argued that having cores specialize on portions of the
VMM will increase locality. They also suggest that, as in [55], assigning certain functionality to
specialized heterogeneous cores can increase performance, and that assigning cores will simplify
and enhance scalability for I/O in multicore virtualization systems. Finally, they cite evidence
that multicore architecture is moving towards high-performance inter-core communication [72],
as in AMD HyperTransport [3] and Intel QuickPath [48], which will further improve the intercore communication latency of sidecore-inspired architectures.
2009-12-08
8. References
[1] Onur Aciicmez and Afshin Latifi and Jean-Pierre Seifert and Xinwen Zhang. A Trusted
Mobile Phone Prototype. 5th IEEE Consumer Communications and Networking Conference
(CCNC), Las Vegas, NV, USA, 2008.
[2] Keith Adams and Ole Agesen. A Comparison of Software and Hardware Techniques for x86
Virtualization.
The 12th International Conference on Architectural Support For
Programming Languages and Operating Systems (ASPLOS), San Jose, CA, USA, pages 213, 2006.
[3] Advanced Micro Devices, Inc..
AMD HyperTransport Technology
http://www.amd.com/us/products/technologies/hypertransporttechnology/Pages/hypertransport-technology.aspx (accessed 12 Oct 2009), 2009.
page.
[4] Advanced Micro Devices, Inc..
AMD I/O Virtualization Technology (IOMMU)
Specification.
http://www.mimuw.edu.pl/
vincent/lecture6/sources/amd-pacificaspecification.pdf (last accessed 24 Sept 2009), 2009.
[5] Advanced Micro Devices, Inc..
AMD-V Nested Paging.
http://developer.amd.com/assets/NPT-WP-1 1-final-TM.pdf, 2008.
available
at
[6] Advanced
Micro
Devices,
Inc..
AMD-V
Technology
page.
http://www.amd.com/us/products/technologies/virtualization/Pages/amd-v.aspx, 2009.
[7] Advanced Micro Devices, Inc.. AMD64 Virtualization Codenamed Pacifica Technology:
Secure
Virtual
Machine
Architecture
Reference
Manual.
http://support.amd.com/us/Processor_TechDocs/34434-IOMMU-Rev_1.26_2-11-09.pdf (last
accessed 24 Sept 2009), 2005.
[8] Advanced Micro Devices, Inc.. Live Migration with AMD-V Extended Migration
Technology. available at http://developer.amd.com/assets/43781-3.00-PUB_Live-VirtualMachine-Migration-on-AMD-processors.pdf, 2008.
[9] Nidhi Aggarwal and Parthasarathy Ranganathan. Configurable Isolation: Building High
Availability Systems with Commodity Multi-Core Processors. The 34th Annual ACM
SIGARCH International Symposium on Computer Architecture (ISCA), 2007.
[10] Ross Anderson.
Security Policies. available at
rja14/Papers/security-policies.pdf, accessed 10 October 2009.
http://www.cl.cam.ac.uk/
[11] ARM Limited. ARM Architecture Reference Manual: ARMv7-A and ARMv7-R edition.
placeholder
at
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0406b/index.html
2009-12-08
(accessed 2 November 2009), document must be obtained via an ARM support website
account, 2009.
[12] ARM Limited. ARM PrimeCell Infrastructure AMBA 3 TrustZone Protection Controller
(BP147)
Revision:
r0p0.
available
at
http://infocenter.arm.com/help/topic/com.arm.doc.dto0015a/DTO0015_primecell_infrastruct
ure_amba3_tzpc_bp147_to.pdf (accessed 3 November 2009), 2004.
[13] ARM Limited. ARM Security Technology: Building a Secure System using TrustZone
Technology. available at http://infocenter.arm.com/help/topic/com.arm.doc.prd29-genc009492c/PRD29-GENC-009492C_trustzone_security_whitepaper.pdf (last accessed 25
September 2009), 2009.
[14] ARM Limited.
ARM TrustZone System Design
http://www.arm.com/products/security/trustzone/systemdesign.html
September 2009), 2009.
page.
(last
available
accessed
at
25
[15] ARM Limited. PrimeCell Infrastructure AMBA 3 AXI TrustZone Memory Adapter
(BP141)
Revision:
r0p0.
available
at
http://infocenter.arm.com/help/topic/com.arm.doc.dto0015a/DTO0015_primecell_infrastruct
ure_amba3_tzpc_bp147_to.pdf (accessed 3 November 2009), 2004.
[16] François Armand and Michel Gien. A Practical Look at Micro-Kernels and Virtual
Machine Monitors. Proceedings of the 6th Consumer Communications and Networking
Conference (IEEE CCNC '09), Las Vegas, NV, USA, 2009.
[17] Vasanth Bala and Evelyn Duesterwald and Sanjeev Banerjia. Dynamo: A Transparent
Dynamic Optimization System. Proceedings of the ACM SIGPLAN 2000 Conference on
Programming Language Design and Implementation (PLDI), Vancouver, British Columbia,
Canada, 2000.
[18] Leonid Baraz and Tevi Devor and Orna Etzion and Shalom Goldenberg and Alex
Skaletsky and Yun Wang and Yigal Zemach. IA-32 execution layer: a two-phase dynamic
translator designed to support IA-32 applications on Itanium®-based systems. Proceedings
of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, pages 191201, 2003.
[19] Paul Barham and Boris Dragovic and Keir Fraser and Steven Hand and Tim Harris and
Alex Ho and Rolf Neugebauer and Ian Pratt and Andrew Warfield. Xen and the Art of
Virtualization. Proceedings of the nineteenth ACM symposium on Operating systems
principles in Operating Systems Review, pages 164--177, New York, 2003. ACM Press.
[20] Andrew Baumann and Paul Barham and Pierre-Evariste Dagand and Tim Harris and
Rebecca Isaacs and Simon Peter and Timothy Roscoe and Adrian Schüpbach and and
2009-12-08
Akhilesh Singhania. The Multikernel: A new OS architecture for scalable multicore systems.
Proceedings of the 22nd ACM Symposium on OS Principles (SOSP '09), Big Sky, MT,
USA, 2009.
[21] Andrew Baumann and Simon Peter and Adrian Schüpbach and Akhilesh Singhania and
Timothy Roscoe and Paul Barham and Rebecca Isaacs. Your computer is already a
distributed system. Why isn’t your OS?. Proceedings of the 12th USENIX Workshop on Hot
Topics in Operating Systems, Monte Verità, Switzerland, 2009.
[22] Steven M. Bellovin. Virtual Machines, Virtual Security?. CACM: Communications of
the ACM, 49(10):104, 2006.
[23] Christer Bengtsson and Mats Brorsson and Håkan Grahn and Erik Hagersten and Bengt
Jonsson and Christoph Kessler and Björn Lisper and Per Stenström and Bertil Svensson.
Multicore computing — the state of the art. available at http://eprints.sics.se/3546/01/SMIMulticoreReport-2008.pdf, accessed 3 October 2009, 2008.
[24] Common Criteria Development Board and Common Criteria Maintenance Board.
Common Criteria for Information Security Evaluation v3.1 release 3. Members of the
Common
Criteria
Recognition
Agreement,
2009.
available
at
http://www.commoncriteriaportal.org/thecc.html (last accessed 19 September 2009).
[25] Jörg Brakensiek and Axel Dröge and Martin Botteck and Hermann Härtig and Adam
Lackorzynski. Virtualization as an Enabler for Security in Mobile Devices. First Workshop
on Isolation and Integration in Embedded Systems (IIES '08), Glasgow, UK, 2008.
[26] Sergey Bratus and Michael E. Locasto. Traps, Events, Emulation, and Enforcement:
Managing the Yin and Yang of Virtualization-Based Security. Proceedings of the 1st ACM
Workshop on Virtual Machine Security (VMSEC '08), Fairfax, VA, USA, 2008.
[27] David Chaum and Torben Pryds Pederson. Wallet Databases with Observers. Advances
in Cryptology – CRYPTO 1992, LNCS 740, pages 89 - 105, 1993.
[28] Xiaoxin Chen and Tal Garfinkel and E. Christopher Lewis and Pratap Subrahmanyam
and Carl A. Waldspurger and Dan Boneh and Jeffrey Dwoskin and Dan R.K. Ports.
Overshadow: A Virtualization-Based Approach to Retrofitting Protection in Commodity
Operating Systems. Proceedings of the 13th Annual International ACM Conference on
Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008.
[29]
David Chisnall. The Definitive Guide to the Xen Hypervisor. Prentice Hall, 2008.
[30] Andy Chou and Junfeng Yang and Benjamin Chelf and Seth Hallem and and Dawson
Engler. An Emprical Study of Operating System Errors. Proceedings of the 18th ACM
Symposium on Operating System Principles (SOSP), 2001.
2009-12-08
[31] George Coker. Xen Security Modules (XSM). Xen Summit 2007 presentation,
http://www.xen.org/files/xensummit_4/xsm-summit-041707_Coker.pdf (accessed Oct 4
2009), 2007.
[32] Landon P. Cox and Peter M. Chen. Pocket Hypervisors: Opportunities and Challenges.
Eighth IEEE Workshop on Mobile Computing Systems and Applications, 2007.
[33] John Criswell and Andrew Lenharth and Dinakar Dhurjati and Vikram Adve. Secure
Virtual Architecture: A Safe Execution Environment for Commodity Operating Systems.
21st ACM Symposium on Operating System Principles (SOSP '07), pages 351-366, 2007.
[34] Leonardo Dagum and Ramesh Menon. OpenMP: An Industry Standard API for SharedMemory Programming. IEEE Computational Science and Engineering, 5(1):46-55, 1998.
[35] George W. Dunlap and Samuel T. King and Sukru Cinar and Murtaza A. Basrai and Peter
M. Chen. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and
Replay. Proceedings of the 2002 Symposium on Operating Systems Design and
Implementation (OSDI), 2002. (published in special issue of ACM SIGOPS Operating
Systems Review, volume 36, winter 2002).
[36] George W. Dunlap and Dominic G. Lucchetti and Peter M. Chen and Michael A.
Fetterman. Execution Replay for Multiprocessor Virtual Machines. Proceedings of the 2008
ACM SIGPLAN/SIGOPS international conference on Virtual Execution Environments (VEE
'08), Seattle, WA, USA, 2009.
[37] Dawson R. Engler and M. Frans Kaashoek and James W. O'Toole. Exokernel: An
Operating System Architecture for Application-Level Resource Management. Proceedings
of the 15th Symposium on Operating System Principles(SOSP 1995), 1995.
[38] Tal Garfinkel and Ben Pfaff and Jim Chow and Mendel Rosenblum and Dan Boneh.
Terra: A Virtual Machine-Based Platform for Trusted Computing. Proceedings of the 19th
Symposium on Operating System Principles(SOSP 2003), 2003.
[39] Tal Garfinkel and Mendel Rosenblum. A Virtual Machine Introspection Based
Architecture for Intrusion Detection. Proc. Network and Distributed Systems Security
Symposium, 2003.
[40] Tal Garfinkel and Mendel Rosenblum. When Virtual is Harder than Real: Security
Challenges in Virtual Machine Based Computing Environments. Proceedings of the 10th
Workshop on Hot Topics in Operating Systems (HotOS-X), 2005.
[41] Robert P. Goldberg. Survey of Virtual Machine Research. IEEE Computer, :34--45,
1974.
available
at
https://agora.cs.illinois.edu/download/attachments/10454931/goldberg74.pdf.
2009-12-08
[42] Steven Hand and Andrew Warfield and Keir Fraser and Evangelos Kotsovinos and Dan
Magenheimer. Are Virtual Machine Monitors Microkernels Done Right?. Proceedings of
the 10th USENIX Workshop on Hot Topics in Operating Systems, Santa Fe, NM, USA,
2005.
[43] Brian Hay and Kara Nance. Forensics Examination of Volatile System Data Using
Virtual Introspection. ACM SIGOPS Operating Systems Review, 42(3):74--82, 2008.
[44] Gernot Heiser. The Role of Virtualization in Embedded Systems. First Workshop on
Isolation and Integration in Embedded Systems (IIES '08), Glasgow, UK, 2008.
[45] Gernot Heiser and Volkmar Uhlig and Joshua LeVasseur. Are Virtual-Machine Monitors
Microkernels Done Right?. Operating Systems Review, 40(1):95--99, 2006.
[46] Joo-Young Hwang and Sang-Bum Suh and Sung-Kwan Heo and Chan-Ju Park and JaeMin Ryu and Seong-Yeol Park and Chul-Ryun Kim. Xen on ARM: System Virtualization
using Xen Hypervisor for ARM-based Secure Mobile Phones. 5th IEEE Consumer
Communications and Networking Conference (CCNC 2008), Las Vegas, NV, USA, 2008.
[47] Hermann Härtig and Michael Hohmuth and Norman Feske and Christian Helmuth and
Adam Lackorzynski and Frank Mehnert and Michael Peter. The Nizza Secure-System
Architecture. Proceedings of CollaborateCom '05, San Jose, CA, USA, 2005.
[48] Intel
Corporation.
Intel
QuickPath
Technology
http://www.intel.com/technology/quickpath/ (accessed 12 October, 2009), 2009.
page.
[49] Intel Corporation. Intel Virtualization Technology for Directed I/O. available at
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf (last
accessed 24 September, 2009), 2008.
[50] Megumi Ito and Shuichi Oikawa. Lightweight Shadow Paging for Efficient Memory
Isolation in Gandalf VMM. 11th IEEE Symposium on Object Oriented Real-Time
Distributed Computing (ISORC), pages 508-515, Orlando, FL, USA, 2008.
[51] Xuxian Jiang and Xinyuan Wang. Stealthy Malware Detection Through VMM-Based
'Out of the Box' Semantic View Reconstruction. Proceedings of the 14th ACM Conference
on Computer and Communications Security (CCS '07), Alexandria, VA, USA, 2007.
[52] Ashlesha Joshi and Samuel T. King and George W. Dunlap and Peter M. Chen.
Detecting Past and Present Intrusions through Vulnerability-Specific Predicates. Proceedings
of the 20th Symposium on Operating System Principles(SOSP 2005), pages 91--104,
Brighton, UK, 2005.
[53] Gerwin Klein and Kevin Elphinstone and Gernot Heiser and June Andronick and David
Cock and Philip Derrin and Dhammika Elkaduwe and Kai Engelhardt and Rafal Kolanski
2009-12-08
and Michael Norrish and Thomas Sewell and Harvey Tuch and Simon Winwood. seL4:
Formal Verification of an OS Kernel. Proceedings of the 22nd ACM Symposium on OS
Principles
(SOSP
'09),
Big
Sky,
MT,
USA,
2009.
available
at
http://ertos.nicta.com.au/publications/papers/Klein_EHACDEEKNSTW_09.pdf (accessed 26
September, 2009).
[54] Kirk L. Kroeker.
52(3):18--20, 2009.
The Evolution of Virtualization.
Communications of the ACM,
[55] Sanjay Kumar and Ada Gavrilovska and Karsten Schwan and Srikanth Sundaragopalan.
C-CORE: Using Communication Cores for High Performance Network Services.
Proceedings of the 4th IEEE International Symposium on Network Computing and
Applications (NCA '05), 2005.
[56] Sanjay Kumar and Himanshu Raj and Karsten Schwan and Ivan Ganev. Re-architecting
VMMs for Multicore Systems: The Sidecore Approach. Proceedings of the 2007 Workshop
on the Interaction between Operating Systems and Computer Architecture, 2007.
[57]
L4HQ.org. L4Linux info page. http://l4linux.org/ (accessed 28 Sept 2009), 2009.
[58] L4Ka.org. L4Ka pre-virtualization page. http://l4ka.org/projects/virtualization/afterburn/
(accessed 28 Sept 2009), 2009.
[59] Lionel Litty and H. Andrés Lagar-Cavilla and David Lie. Hypervisor Support for
Identifying Covertly Executing Binaries. Proceedings of the 17th USENIX Security
Symposium, pages 243--258, San Jose, CA, USA, 2008.
[60] Gabriel H. Loh.
3d-stacked Memory Architectures for Multi-core Processors.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA '08),
pages 453-464, Washington, DC, USA, 2008.
[61] Gil Neiger. Intel Virtualization Technology: Hardware Support for Efficient Processor
Virtualization. Intel Technology Journal, 10(3):167--177, 2006.
[62] NICTA. Iguana project page. http://www.ertos.nicta.com.au/software/kenge/iguanaproject/latest/ (last accessed 19 September, 2009), 2009.
[63] NICTA. NICTA announces world-first research breakthrough. seL4 verification NICTA
press release, available at http://www.nicta.com.au/news/home_page_content_listing/worldfirst_research_breakthrough_promises_safety-critical_software_of_unprecedented_reliability
(last accessed 19 September, 2009), 2009.
[64] Open Kernel Labs.
Open Kernel Labs Secure Hypercell Technology page.
http://www.ok-labs.com/solutions/secure-hypercell-technology (accessed 28 September,
2009), 2009.
2009-12-08
[65] OpenVZ project. OpenVZ Wiki Main Page. http://wiki.openvz.org/Main_Page (accessed
28 Sept 2009), 2009.
[66] Bryan D. Payne and Martim Carbone and Wenke Lee. Secure and Flexible Monitoring
of Virtual Machines. Proceedings of the 23rd Annual Computer Security Applications
Conference (ACSAC 2007), 2007.
[67] Bryan D. Payne and Martim Carbone and Monirul Sharif and Wenke Lee. Lares: An
Architecture for Secure Active Monitoring Using Virtualization. Proceedings of the IEEE
Symposium on Security and Privacy, 2008.
[68] Steven Pope and David Riddoch. Virtualization and multicore x86 CPUs. EDN, 2008.
available at http://www.edn.com/article/CA6584878.html, accessed 11 October 2009.
[69] Gerald J. Popek and Robert P. Goldberg. Formal Requirements for Virtualizable Third
Generation Architectures. Communications of the ACM, 17(7):412-421, 1974.
[70] Dan Ports and Tal Garfinkel. Towards Application Security On Untrusted Operating
Systems. USENIX Workshop on Hot Topics in Security (HOTSEC), 2008.
[71] Robert D. Blumofe and. Cilk: An Efficient Multithreaded Runtime System. ACM
SIGPLAN Notices, 30(8):207-216, 1995.
[72] Bratin Saha and others. Enabling Scalability and Performance in a Large Scale CMP
Environment. Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on
Computer Systems, pages 73 - 86, Lisbon, Portugal, 2007.
[73] Reiner Sailer and Treng Jaeger and Enriquillo Valdez and Ronald Perez and Stefan
Berger and John Linwood Griffin and Leendert van Doorn. Building a MAC-Based Security
Architecture for the Xen Open-Source Hypervisor. Proceedings of the 21st Annual
Computer Security Applications Conference (ACSAC '05), pages 276--285, 2005.
[74] Karen Scarfone and John Padgette. Bluetooth Security Guide: Recommendations of the
National Institute of Standards and Technology. NIST Special Publication 800-121, available
at http://csrc.nist.gov/publications/nistpubs/800-121/SP800-121.pdf (last accessed 20
September 2009), 2008.
[75] Michael D. Schroeder and Jerome H Saltzer. A hardware architecture for implementing
protection rings. Proceedings of the 3rd ACM Symposium on Operating System Principles
(SOSP '71), Palo Alto, CA, USA, 1971.
[76] Adrian Schüpbach and Simon Peter and Andrew Baumann and Timothy Roscoe and Paul
Barham and Tim Harris and Rebecca Isaacs. Embracing diversity in the Barrelfish manycore
operating system. Proceedings of the Workshop on Managed Many-Core Systems (MMCS
'08), Boston, MA, USA, 2008.
2009-12-08
[77] Arvind Seshadri and Mark Luk and Ning Qu and Adrian Perrig. SecVisor: A Tiny
Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes. Proceedings of
the 21st Symposium on Operating System Principles(SOSP 2007), Stevenson, Washington,
USA, 2007.
[78] Takahiro Shinagawa and others. BitVisor: A Thin Hypervisor for Enforcing I/O Device
Security. Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on
Virtual Execution Environments (VEE '09), Washington, D.C., USA, 2009.
[79] James E. Smith and Ravi Nair. The Architecture of Virtual Machines. IEEE Computer,
38(5):32-38, 2005.
[80] James E. Smith and Ravi Nair. Virtual Machines: Versatile Platforms for Systems and
Processes. Morgan Kaufmann/Elsevier, 2005.
[81] Stephen Soltesz and Herbert Potzl and Marc E. Fiuczynski and Andy Bavier and Larry
Peterson. Container-Based Operating System Virtualization: A Scalable, High-Performance
Alternative to Hypervisors. Proceedings of the 2nd ACM SIGOPS/EuroSys European
Conference on Computer Systems (Eurosys 2007), pages 275--287, Lisbon, Portugal, 2007.
[82] Ray Spencer and Peter Loscocco and Stephen Smalley and Mike Hibler and David
Andersen and Jay Lepreau. The flask security architecture: system support for diverse
security policies. Proceedings of the 8th conference on USENIX Security Symposium, pages
11, 1999.
[83] Jeremy Sugerman and Ganesh Venkitachalam and Beng-Hong Lim. Virtualizing I/O
Devices on VMware Workstation’s Hosted Virtual Machine Monitor. Proceedings of the
2001 USENIX Annual Technical Conference, Boston, MA, USA, 2001. available at
http://www.vmware.com/vmtn/resources/530 (last accessed 19 September, 2009).
[84] Sang-Bum Suh and Joo-Young Hwang et al. Computing State Migration between
Mobile Platforms for Seamless Computing Environments.
5th IEEE Consumer
Communications and Networking Conference (CCNC 2008), Las Vegas, NV, USA, 2008.
[85] Sang-Bum Suh and Sung-Min Lee and Sangdok Mo and Bokdeuk Jeong and Joo-Young
Hwang and Chan-Ju Park and Sung-Kwan Heo and Junghyun Yoo and Jae-Min Ryu and
Chul-Ryun Kim and Seong-Yeol Park and Jae-Ra Lee and Il-Pyung Park and and Hosoo Lee.
Demonstration of the Secure VMM for Beyond 3G Mobile Terminal. 5th IEEE Consumer
Communications and Networking Conference (CCNC 2008), Las Vegas, NV, USA, 2008.
[86] Tilera. Tilera processor page. http://www.tilera.com/products/processors.php, accessed
11 October 2009, 2009.
2009-12-08
[87] Trusted Computing Group.
TCG Mobile Reference Architecture. available at
http://www.trustedcomputinggroup.org/resources/mobile_phone_work_group_mobile_refere
nce_architecture, accessed 28 Sept 2009, 2007.
[88] TU Dresden. The L4 µ-Kernel Family. http://os.inf.tu-dresden.de/L4/ (last accessed 19
September, 2009).
[89] Steven J. Vaughan-Nichols. New Approach to Virtualization is Lightweight. IEEE
Computer, 39(11):12-14, 2006.
[90] Steven J. Vaughan-Nichols. Virtualization Sparks Security Concerns. IEEE Computer,
41(8):13-15, 2008.
[91] VMWare. Performance Evaluation of AMD RVI Hardware Assist. available at
http://www.vmware.com/resources/techresources/1079 (accessed 22 September, 2009).
[92] VMWare.
Transparent
Previrtualization
info
page.
http://www.vmware.com/interfaces/paravirtualization.html (accessed 28 September, 2009).
[93] VMWare.
VMWare
ESX
and
ESXi
product
http://www.vmware.com/products/esx/index.html (accessed 19 September, 2009).
page.
[94] VMWare.
VMWare
Workstation
product
page.
http://www.vmware.com/products/workstation/index.html (accessed 19 September, 2009).
[95] John Watson. VirtualBox: Bits and Bytes Masquerading as Machines. Linux Journal,
2008. available at http://www.linuxjournal.com/article/9941 (accessed 19 September, 2009).
[96] Robert N. M. Watson. Exploiting Concurrency Vulnerabilities in System Call Wrappers.
1st USENIX Workshop on Offensive Technologies, 2007.
[97] Clark Weissman. BLACKER: security for the DDN examples of A1 security
engineering trades. Proceedings of the 1992 IEEE Symposium on Research in Security and
Privacy, pages 286-292, Oakland, CA, USA, 1992.
[98] Philip M. Wells and Koushik Chakraborty and Gurindar S. Sohi. Dynamic Heterogeneity
and the Need for Multicore Virtualization. ACM SIGOPS Operating Systems Review,
43(2):5-14, 2009.
[99] Philip M. Wells and Koushik Chakraborty and Gurindar S. Sohi. Mixed-Mode Multicore
Reliability. Proceedings of the 14th Annual International ACM Conference on Architectural
Support for Programming Languages and Operating Systems (ASPLOS '09), 2009.
2009-12-08
[100] Andrew Whitaker and Marianne Shaw and Steven D. Gribble. Denali: Lightweight
Virtual Machines for Distributed and Networked Applications. Proceedings of the USENIX
Annual Technical Conference, 2002.
[101] wikipedia.
Ring
(computer
security).
available
at
http://en.wikipedia.org/wiki/Ring_(computer_security), last modified on 21 August 2009
(last accessed 19 September, 2009), 2009.
[102] Johannes Winter. Trusted Computing Building Blocks for Embedded Linux-based ARM
TrustZone Platforms. Proceedings of the 3rd ACM Workshop on Scalable Trusted
Computing, pages 21--30, Fairfax, VA, USA, 2008.
[103] Xen
ARM
Project.
Xen
ARM
Project
page.
http://wiki.xensource.com/xenwiki/XenARM (last accessed 20 September 2009), 2009.
[104] Min Xu and Xuxian Jiang and Ravi Sandhu and Xinwen Zhang. Towards a VMM-based
Usage Control Framework for OS Kernel Integrity Protection. Proceedings of the 12th ACM
Symposium on Access Control Models and Technologies (SACMAT 2007), Sophia
Antipolis, France, 2007.
[105] Jisoo Yang and Kang G. Shin. Using Hypervisor to Provide Data Secrecy for User
Applications on a Per-Page Basis. Proceedings of the Fourth ACM SIGPLAN/SIGOPS
International Conference on Virtual Execution Environments) (VEE '08), pages 71--80,
Seattle, WA, USA, 2008.
[106] Xinwen Zhang and Onur Aciicmez and Jean-Pierre Seifert. A Trusted Mobile Phone
Reference Architecture via Secure Kernel. Proceedings of the 2nd ACM Workshop on
Scalable Trusted Computing), Alexandria, VA, USA, 2007.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement