On Realizing Application QoS on a Virtualized

On Realizing Application QoS on a Virtualized
On Realizing Application QoS on a
Virtualized Network
Annual Progress Seminar Report
Submitted in partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
Rinku Shah
(Roll No. 134053001)
Under the guidance of
Prof. Umesh Bellur
Prof. Purushottam Kulkarni
Department of Computer Science and Engineering
Indian Institute of Technology, Bombay
August, 2014
I would like to take this opportunity to thank my advisors, Prof. Umesh Bellur and Prof.
Purushottam Kulkarni, for their constant guidance, direction, and valuable feedback.
I would also like to thank Prof. Varsha Apte and Prof. Mythili Vutukuru for their
support, and feedback. I would like to thank my colleague, Debadatta Mishra for lending
me a helping hand. Lastly, I would like to thank my family and friends for their constant
love and support.
Rinku Shah
Machine virtualization has proved to be beneficial, both to the service-providers
and service-users. Service-providers could increase financial gains by using techniques like migration and consolidation of virtual machines, hence requiring smaller
number of physical machines. Service-users get the software-view of their machines.
They could now easily create, destroy, migrate, clone and snapshot their virtual
machines. They now pay only as per their use.
After the success of ‘Compute Virtualization’, now the world is interested in
‘Network Virtualization’. People want to have the same features with the networks.
If network become a piece of software, it would be easy to create, destroy, migrate,
clone and snapshot the network. But with the new bag of features, we should not
compromise on performance.
In this report, we propose the methodology for realizing Application QoS in a
Virtualized Network. We also project the other possible research problem areas for
a Software-defined Data-center solution, that aim at increasing resource utilization,
and improving application performance.
1 Work Progress
Courses Taken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Seminar Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Network Virtualization
Why do we require to Virtualize Networks? . . . . . . . . . . . . . .
Network Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . .
The Network Virtualization Stack . . . . . . . . . . . . . . .
Areas for Exploration . . . . . . . . . . . . . . . . . . . . . .
3 Building Blocks of a Virtualized Network
Data-plane building blocks . . . . . . . . . . . . . . . . . . . . . . . .
PV-Bridging . . . . . . . . . . . . . . . . . . . . . . . . . . .
SR-IOV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Intel’s Data plane development kit, DPDK . . . . . . . . . .
Data-plane Management blocks . . . . . . . . . . . . . . . . . . . . .
NetVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scalable NIC for end-host Rate limiting (SENIC) . . . . . . .
ClickOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Problem Formulation
Requirements for a Network-QoS solution . . . . . . . . . . . . . . .
Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Solution for transmit bandwidth partitioning [9]
5 Conclusion
. . . . . . . . . . .
List of Figures
The Network Virtualization Stack
. . . . . . . . . . . . . . . . . . .
Example Network Topology . . . . . . . . . . . . . . . . . . . . . . .
Example Network Topology with Performance Metrics . . . . . . . .
Steps involved for achieving Application-QoS guarantees for a virtualized network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Paravirtualized Device Model (Source: [10]) . . . . . . . . . . . . . .
SR-IOV virtualization architecture (Source [11])
. . . . . . . . . . .
NetVM System Overview (Source [18]) . . . . . . . . . . . . . . . . .
FasTrack Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SENIC- Schedule and Pull model(Source [21]) . . . . . . . . . . . . .
Bandwidth Control of Intel 82576 GbE(Source: [5]) . . . . . . . . . .
Work Progress
Courses Taken
Table 1: Courses Taken (2013-14)
CS 681
CS 683
CS 699
HS 699
TD 610
Course Title
Performance Analysis of Computer Systems and Network
Advanced Computer Architecture
Software Lab
Communication and Presentation Skills
Contemporary Critical Issues in Technology and Development
Seminar Presentation
Network I/O virtualization is a challenging problem to solve. In the seminar report,
various reasons for network I/O virtualization overheads are analyzed. The report
also provides a detailed classification on Network I/O Virtualization Optimization
techniques. The report compares the performance of Paravirtualized, Emulated and
Direct I/O model with respect to network I/O access.
The report concluded with identification of open challenges like 1. Dynamic Setup for VM Network I/O policies
2. Offload Netfilter Functions
3. Live-migration Challenge in Direct I/O model
4. Scalability Challenge in Direct I/O model
CPI= 9.08
Network Virtualization
Virtual networks, like virtual machines are automatically provisioned and managed,
independent of the underlying hardware. There are various network virtualization
solutions that offer migration of hardware network components into logical software
components. Translating a hardware component into a software object, provides
numerous benefits like, creation, destruction, migration, cloning of these softwaretranslated hardware entities. But along with all these benefits, there is a performance trade-off, as the hardware tasks will now be performed by the software. All
network virtualization solution providers require to solve the performance problem.
Why do we require to Virtualize Networks?
There are various reasons for Network Virtualization’s popularity [12].
1. Network Abstraction
Network virtualization provides an abstraction of a network, and bundles all
network components together, as a software object. It is possible to create,
delete, change, migrate, snapshot, and clone the network easily.
2. Quick, On-demand, and Automated Network Provisioning
Network virtualization solutions prepare the complete network model in software and can enable any network topology in seconds. It enables a library
of logical networking elements and services, such as logical switches, routers,
firewalls, load balancers, VPN, and workload security.
3. Workload Placement
Network virtualization can provide large virtual segments for workload placement. Without network virtualization, workload placement and mobility is
confined to individual physical subnets. Network operators require to manually configure the VLANs, ACLs, firewall rules, etc on the hardware boxes.
This is a slow process, and also it has configuration limits, for example, maximum number of VLANs that could be configured are 4096.
4. Live Mobility
With Network Virtualization, it is possible to live migrate the entire network,
or even a VM of the network without loss of layer-2 connectivity.
5. Elasticity and Scalability
The kind of technology choices offered by Network Virtualization solutions
will not limit the number of tenants. We can dynamically add or remove
components and services in the existing network.
All these features favour the evolution of virtualized network systems. It is also necessary to have atleast, an equivalent performance support by the virtualized network
systems. Our research intends to analyze and provide solutions for a performanceaware virtualized network system.
Network Virtualization
The Network Virtualization Stack
Cloud Management
Platform (CMP)
Network QoS,
Rule manager
VM Placement
& Migration
Inter-VM Flow
Phy/Log object
Failure Mgmt
Network QoS
Physical & Logical
Network Management Plane
Tunnel creation
at VMs & Phy
VM Placement
and Migration
Routing control
for internal
logical nets
Routing control
for outside
Physical & Logical
Network Control Plane
Data Plane
Building Blocks
Figure 1: The Network Virtualization Stack
Data Plane
Physical layer
The typical layers of any Software-defined Data-center solution are shown in
Figure 1. A typical Network Virtualization Solution provider divides its architecture
into three layers; Physical, Virtual and Cloud.
The Physical Layer consists of hypervisor hosts, and appliances like service
nodes, and gateways. Gateways provide connectivity to the outside world. Service nodes provide centralized functionality, like tunneling for unicast and multicast
traffic. A typical Virtual layer has the following logical planes; Data-plane, Dataplane management, Control plane, and Management plane.
A typical cloud layer comprises of solutions that provide cloud services, i.e.
Cloud Management Platform (CMP).The Cloud Management Platform (CMP) is
the consumer of this Network Virtualization system. The CMP inputs the networktopology through a sequence of API calls. The Application layer comprises of multitiered, single-tiered applications, or NFV applications like, Firewalls, Intrusion detection system, and Load balancers.
We take an example of a multi-tiered application to drive through the network
virtualization stack. It comprises of a web-tier, an application-tier, and a databasetier. During this traversal, we propose our methodology for providing Application
QoS guarantees for the virtualized network. Current systems consider the application traffic classes alone, to guarantee application QoS.
server 2
server 1
Figure 2: Example Network Topology
Consider the example network topology of a multi-tiered system as shown in
Figure 2. To guarantee application QoS, we need to acquire the performance metrics
between each node within the topology. Our assumption is that the performance
metrics are given by the customer of this multi-tiered system.
In our example, HTTP requests arrive from the outside world to the Web Server,
we have named the outside world entity as ‘External’. Web Server decodes the
request type, and accordingly forward the request to either, Application Server
1 or Application Server 2. Application Servers in-turn may query the Database
servers. We have taken this scenario to give an example for a case, where some
tasks have higher priority than the others; i.e; Database server should have different
processing delay for flows coming from Application Server 1 and Application Server
2. For example, in a banking application, one App-Server is responsible for handling
server 1
<B_A1D, B_DA1, D_DA1>
<B_WA1, B_A1W>
<B_EW, B_WE>
<B_WA2, B_A2W>
<B_A2D, B_DA2, D_DA2>
server 2
Figure 3: Example Network Topology with Performance Metrics
real-time operations like ‘Withdrawal’, whereas the other App-Server is responsible
for handling non-real-time operations like cheque-book request processing. It is
understood that the ‘Withdrawal’ operation should be given higher priority, and
processed faster.
The example in Figure 3 provides the performance metric values between the
communicating nodes. Bij is the convention used to represent the bandwidth requirement between nodes ‘i’ and ‘j’. Similarly, P rocDij is the convention used to
represent the maximum processing delay that could be tolerated between nodes ‘i’
and ‘j’.
We use ‘E’ to represent nodes external to the data-center. ‘W’, ‘A1’, ‘A2’, and
‘D’ represents the web server, application server 1, application server 2, and the
database server respectively.
The application QoS information can be fed as input to the data-center manager
node through an API, in a tabular form as shown in Tables 2 and 3.
Table 2: Bandwidth guarantees requested for the sample topology
Table 3: Processing Delay guarantees requested for the sample topology
P rocDDA1
P rocDDA2
Figure 4: Steps involved for achieving Application-QoS guarantees for a virtualized network
Following are the sequence of steps involved in achieving Application QoS guarantees as shown in Figure 4.
1. The customer provides three inputs to the Cloud Management Platform(CMP);
the Network-Topology (an adjacency matrix), Requested Bandwidth guarantee matrix, and Requested Processing-Delay guarantee matrix.
2. The CMP generates a sequence of API calls to the Data-center manager for
the requested topology.
3. A Manager node resides in the Management plane. It is a single point of
configuration through which the virtual network requirements are fed in to the
Software Defined Data Center. Any CMP can communicate to the Manager
using some API. This plane exposes a rich set of API libraries to the outside
The manager-node always has the complete picture of the data-center. It takes
in the requirement in the form of a network topology, and some performance
constraints. It creates a set of configuration policies to satisfy the performance
requirements. It also takes the placement decesion considering the available
resources and requested QoS. There are a number of other management tasks
like, failure recovery, migration, etc. that are taken care by this layer.
The Data-center manager node receives the API calls from the CMP. It has the
complete view of the data-center; i.e.; it is aware of the resource availability
, and capability of each physical entity in the data-center. Capability of the
device means the kind of data-plane building-block that is available on the
device. Using the capability and availability information, the policy manager
could look up into the Bandwidth, and Delay requirements; and specifies the
data-plane building block along with its configuration details to be used to
satisfy the required QoS.
For each logical object (node); manager provides• Data-plane Building block to be used.
• Set of configurations for data-plane management to satisfy the requested
QoS metric values.
• Physical host is chosen on which the object is to be placed. We would use
placement algorithms provided by manager, and do not intend to modify
For the entire network topology, Layer 2 connectivity configuration is also
generated by the manager. It is now the job of the control-plane to implement
the set of configurations delivered by the manager.
Control plane
A typical controller is responsible for executing control plane requirements like
creation of tunnels between VM nodes for layer2 connectivity, Network QoS
policy control, Controlling Firewall rules, ACLs, etc. The controller nodes are
deployed as a cluster, to ensure high availability and scalability. Controller
cluster configures all the soft-switches with the configuration provided by the
manager node. Control-plane nodes act as actuators to implement the configuration, and policies specified by the management modules. Control plane
modules also need to ensure that, the routing paths and configurations are
correctly disseminated for internal as well as external paths. To achieve this,
it needs to virtualize the routing protocols like IS-IS.
Data plane & Data-plane management
A typical Data-plane consists of physical layer abstraction of devices like
switches, and routers. It also enables access-level switching. Virtual network
segments are implemented between soft-switches using MAC-over-IP encapsulation. To support overlay networking; GRE, STT, or VXLAN encapsulation
can be used. Example of data-plane components are vSphere Distributed
Switch (VDS) for vSphere, or Open vSwitch for non-ESX hypervisors.
The data-plane management components manage the data-plane components.
They configure data-plane components to implement tasks like Network-QoS
satisfaction,and implementing layer-2 segment connectivity for the VMs. Example of data-plane management components are solutions like OpenFlow,
NetVM, and FasTrack.
3 * There could be a case when enough resources are not available to satisfy the
requested QoS values. Under such circumstances, there is a feedback generated
by the data-center manager to the CMP, requesting for the changes in QoS
metric values.
4. The CMP passes on the request to the customer.
5. The customer provides the detailed network-topology with modified QoS metric values, and proceeds to step 2.
Areas for Exploration
Application Composition/Decomposition
In certain applications, input-data needs to traverse through mutiple logical stages.
For example, for NFV systems like Intrusion Detection System(IDS), data packets
may flow from one module to another, resulting into module chains. If we could
decompose the IDS virtual object into multiple logical objects, it would be easier
to scale the system.
Figuring out the appropriate decomposition of an application into modules,
would help in better resource utilization, and elasticity.
Also, keeping related
set of modules together would improve performance. Figuring out an appropriate application-composition and application-decomposition is one of the possible
problem area.
Performance vs. Scale Tradeoff
Application Composition/Decomposition would help in giving a flat design. It
would help in higher resource utilization, and also ensure scalability. But as the
chain of modules start growing, the end-to-end delays start increasing. Figuring
out ‘Just-Right’ application composition/decomposition configuration, such that
required performance criteria is met, and application could scales well too, is a
possible problem area.
Building Blocks of a Virtualized Network
We have analyzed some of the data-plane building blocks, and derived methods for
configuring the data-plane objects through the data-plane management objects, so
as to achieve required QoS metric values. We intend to figure out all such building
blocks and perform the same analysis. The following sections detail out the working,
and configuration parameters/methods for each building block.
Data-plane building blocks
Figure 5: Paravirtualized Device Model (Source: [10])
In paravirtualized I/O model [10], device drivers are split into two parts, viz. ,
Front-end and Back-end driver. Front-end driver is installed in guest VM, whereas
Back-end driver is installed in a privileged VM (a.k.a Driver Domain/Domain 0).
Guest OS needs to be modified to achieve this. When I/O operation is requested by
the guest application, front-end driver forwards the requests to the back-end driver.
Back-end driver decodes the requests, maps it to the hardware, and directs the device to complete execution. Back-end driver can help in managing resource-control
as well as many optimizations like batching guest requests. Several optimizations are
implemented to improve its efficiency. A packet transmission takes place according
to the following steps:
1. Packet copy/remap from VMs front-end driver to Driver domains back-end
2. Route the packet through ethernet bridge to physical NIC’s driver
3. Enqueue packet for transmission on network interface
A packet reception takes place according to the following steps:
1. NIC generates interrupt; and is captured by hypervisor
2. Captured interrupt is routed from hypervisor to NIC’s device driver in driver
domain as virtual interrupt
3. NIC’s device driver transfers packet to eth bridge
4. Bridge routes packet to appropriate back-end driver
5. Back-end driver copies/remaps packet to front-end driver in target VM
6. Back-end driver requests the hypervisor to send virtual interrupt to front-end
driver in target VM
7. Hypervisor sends virtual interrupt to front-end driver of target VM
8. Front end driver delivers packet to VM’s network stack
As discussed above, a paravirtualized I/O model has central control over the network
resources, because all incoming/outgoing packets have to go through the hypervisor.
It is possible to specify network-QoS policies at the hypervisor. We could also
have bandwidth partitioning performed at the VM’s back-end driver. The problem
with this model is that, it consumes very high CPU to achieve line rates.
Figure 6: SR-IOV virtualization architecture (Source [11])
Single Root - I/O Virtualization and sharing (SR-IOV) [11] is fast I/O virtualization standard in PCI Express devices. SR-IOV is a PCI-SIG specification that
defines a standard for creating natively shared devices. In SR-IOV, packet multiplexing, address translation, and memory protection is performed by the hardware.
To perform address translation and memory protection securely, SR-IOV uses Intel VT-d technology, which provides hardware functioning of IOMMU. Hypervisor
is completely eliminated from the latency-sensitive I/O path. CPU is no longer
involved in copying data to, and from the VM.
SR-IOV capable device is a PCIe device, that can create multiple virtual functions (VFs). A PCIe device is a collection of one or more functions. An SR-IOV
capable device has one or more Physical functions(PFs); and each PF is a standard
PCIe function associated with multiple Virtual functions (VFs). Each VF acts as a
light-weight PCIe function, that is configured and managed by PFs. SR-IOV comprises of three components; PF driver, VF driver, and SR-IOV manager (IOVM).
The PF driver has access to all PF resources. It manages and configures VFs.
At startup, it sets number of VFs, enables/disables VFs, sets device configurations
like MAC address and VLAN settings for a NIC, configures layer 2 switching. The
VF driver executes on the guest OS, and can access its VF directly (without VMM
involvement). VF needs to duplicate resources such as DMA descriptors, that are
performance-critical; whereas other resources are emulated by IOVM and PF driver.
IOVM provides a virtual full configuration view of each VF to the guest OS, so that
the guest configure VF as a physical device. IOVM helps dynamic addition of
VFs to the host, which are then assigned to the guest OS. Once guest discovers
the assigned VF, it can initialize and configure it as any physical device. PF and
VF driver communicate with each other using hardware-based producer/consumer
technique. Producer writes a message into the mailbox, and rings the doorbell.
The consumer consumes the message, and notifies the producer by setting a bit in
shared register. SR-IOV achieves high throughput (native), and is also scalable. SRIOV consumes higher CPU than Native I/O, but much lower than Paravirtualized
I/O model. This is due to VMM intervention for guest interrupt delivery. VMM
captures the physical interrupt from VF, maps to guest virtual interrupt, and injects
it. VMM needs to emulate virtual Local APIC for HVM guest and event channel for
PVM guest. Authors of [4], [2] have implemented a design such that guest interrupt
delivery is eliminated. Another problem with SR-IOV is that, it is extremely difficult
to replicate the hardware state of NIC due to high frequency and non-deterministic
nature of incoming packets. Most intuitive solution to this problem is; dynamic
switching between direct accessed VF at run-time, and emulated VF at migration
Intel 82576[5] and Intel 82599[6] provide a rich set of configurable parameters in the
hypervisor user-space. These parameters can be utilized for VM network management, and for realizing Network QoS.
Interrupt Throttle Rate is configuration parameter for both PF and VF, whereas
rest parameters are for PF alone.
Following are the details of configurable parameters:
1. Interrupt Throttle Rate
This parameter would allow the specified number of interrupts to be generated
for incoming traffic(kind of limit on RX queue size).
Range- 0, 1, 3, 100-100000
Default= 3
3-Dynamic Conservative
2. LLI (Low Latency interrupts)
(a) LLI port
This parameter allows immediate interrupt generation for a packet received on a specific port. e.g. To reduce latency for TCP/RTP packets
(b) LLI size
This parameter allows immediate interrupt generation for a packet smaller
than the specified size.
(c) LLI push
LLIPush can be set to be enabled or disabled (default). It is most effective
in an environment with many small transactions.
Valid Range: 0-1
Default Value: 0 (disabled)
3. RSS (Receive Side Scaling)
Packets are routed to number of queues as specified. Each queue is processed
by a different processor. (Increasing number of queues might lead to reduced
performance as cache hits might reduce)
Valid Range: 0-8
Maximum supported number of queues
4. VMDq
This parameter is used to specify the number of queues to be added for each
pool(This parameter would be useful for 10G card as it can have variety of
queue configurations per pool).
Valid Range: 0-4 on 82575-based adapters; 0-8 for 82576/82580-based
Default Value: 0
Supports enabling VMDq pools as this is needed to support SR-IOV.
0 = Disabled
1 = Sets the netdev as pool 0
2+ = Add additional queues but they currently are not used
This parameter is forced to 1 or more if the max vfs module parameter is used.
The number of queues available for RSS is limited if this is set to 1 or greater.
DMA coalescing is the parameter to enable/disable packet coalescing.
Valid Range: 0, 250, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000,
9000, 10000
Default Value: 0 (disabled)
This parameter enables or disables DMA Coalescing feature. Values are in
microseconds and increase the internal DMA Coalescing internal timer. DMA
(Direct Memory Access) allows the network device to move packet data directly
to the system’s memory, reducing CPU utilization. However, the frequency
and random intervals at which packets arrive do not allow the system to enter
a lower power state. DMA Coalescing allows the adapter to collect packets
before it initiates a DMA event. This may increase network latency but also
increases the chances that the system will enter a lower power state.
Network policies could be satisfied by configuring one or more of the above listed
Intel’s Data plane development kit, DPDK
“DPDK [17] is the set of optimized user-space software libraries and drivers, that
can be used to accelerate packet processing in Intel architecture“
DPDK achieves high performance because 1. With Paravirtualized I/O model, there was an additional packet copy required
for packet send/receive, because of the split-driver model. SRIOV eliminates
the additional packet copy, by directly assigning virtual functions to VMs. But
the packet still requires to be copied from kernel space to user space.
DPDK maps hardware registers into user-space. DPDK does not require
packet copy from kernel-space to user-space. If DPDK is implemented over
the SRIOV hardware, we can achieve true zero-copy.
2. DPDK provides the configurable feature of assigning hardware queues to soft-
ware flows, and these queues could be assigned to a specific CPU. DPDK
agressively isolate CPU’s by assigning set of software flows to a hardware
queue on a single CPU. This helps in directing all flows of a single application
to arrive on the same CPU, resulting in better cache hit-ratio, hence better
performance. This configuration could also be used to restrict the number
of CPUs assigned for handling specific network flows, resulting in bandwidth
3. DPDK provides supports batch packet processing. This configuration could
help in improving the throughput of the system.
4. DPDK uses huge memory page sharing. The intention of this feature was to
have high speed Inter-VM communication. The packets of the source VM are
placed in the shared memory, destination VM directly picks it up, without any
involvement of the hypervisor.
5. DPDK uses SIMD extensions of Intel hardware, and does some vectorization.
This speeds up parallelizable tasks and improves throughput.
6. DPDK provides use of lockless queues. Most of the times throughput is low,
because the transmit/receive descriptor queues are locked by some process,
and other processes are waiting. Elimination of locks, along with handling of
race condition challenges would improve throughput.
The Intel DPDK vSwitch application-specific options are detailed below [17]:
It specifies CPU ID of the core on which the main switching loop will run
-n NUM
Specifies the number of supported clients
Specifies Hexadecimal bitmask representing the ports to be configured,
where each bit represents a port ID.
e.g. for a portmask of 0x3, ports 0 and 1 are configured
-v NUM
Number of virtual Ethernet devices to configure.
The maximum number of virtual devices currently supported is eight (8).
--config (port,queue,lcore)[,(port,queue,lcore]
Each port/queue/core group specifies the CPU ID of the core, that
will handle ingress traffic for the specified queue on the specified port
DPDK also provides Dynamic Flow Manipulation using ovs-dpctl. Few commands
supported by ovs-dpctl are add-flow, del-flow, mod-flow, and get-flow. These commands are used to dynamically add flows to a switch, delete flows, modify actions
on the flows, get actions set on a particular flow, respectively. Details on these
commands can be obtained from [17].
Data-plane Management blocks
Software-defined data centers virtualize existing network functions like Routers,
Firewalls, and Load Balancers. NetVM [18] is motivated by NFV’s requirement of
high-speed and low latency. Authors of [18] provide 1. Virtualization-based high-speed packet delivery (line-speed)
2. Memory sharing framework for network data
3. Hypervisor-based switching
4. High-speed Inter-VM communication
NetVM makes use of Intel Data plane development kit (DPDK) [17] with some
tuning, to achieve high performance. Figure 7 shows the working of NetVM.
Figure 7: NetVM System Overview (Source [18])
1. NetVM runs in hypervisor user-space
2. Memory is shared by all trusted VMs, to enable zero-packet copy
3. Each VM has its own ring to RX/TX packet descriptor
The NetVM solution has constraints of having trusted VMs, which may not always
be the case.
NetVM uses DPDK as the building block for software flow configurations. It provides NetLib which is an interface between PCI device and user applications. User
can read/write packets from user-space using this interface, and perform actions
like discard, forward to other VM, send it out to the NIC.
FasTrack [19] is motivated by the requirement of providing thousands of networkrules, policies, and traffic rate-limits, to achieve Tenant isolation and QoS. It exploits
the temporal locality in flows and flow sizes, and offloads the subset of network-rules,
from the hypervisor to the switch hardware. It dynamically migrates these rules
from the hypervisor to the switch, and vice-versa.
Traffic Flows
SDN Controller
Figure 8: FasTrack Overview
FasTrack design comprises of a SDN controller that decides, which subset of flows
should be offloaded. Figure 8 shows that the SDN controller directs the flows selected by Per-VM flow placement module through Hypervisor PV-Vif-driver, and
others through SRIOV-VF driver. Flow placement module integrates with an OpenFlow interface alowing the controller to program it. These kind of configurations can
be used to colocate two VMs, where one requires deterministic bandwidth(SRIOV),
and other is fine with best-effort service(PV).
Scalable NIC for end-host Rate limiting (SENIC)
Figure 9: SENIC- Schedule and Pull model(Source [21])
SENIC [21] is an end-host rate-limiting technique. It has the following features1. Provides scalability for more than 1000’s of traffic classes
2. Works at high link speeds
3. Has low CPU overhead
4. Provides accurate and precise rate-limiting
5. Supports hypervisor bypass
Figure 9 shows the working of SENIC.
1. OS notifies NIC about the packet. Schedule per-class queues that are stored
in host RAM
2. DMA packet from host memory to NIC. late-binding of packet to NIC eliminates the requirement of expensive SRAM buffers.
3. Transmit the packet
SENIC supports 10s of thousands of rate limiters, with accurate and precise ratelimiting. We could provide QoS guarantees for those many flows on a single physical
host. But SENIC requires hardware modifications. This building block would be
considered if some hardware device like SRIOV implements the modifications.
ClickOS [20] is motivated by the idea of shifting the middlebox processing on the
software. ClickOS is a tiny Xen-based VM that runs Click. It has single address
space, runs on a single core, uses non-preemptive scheduler, executes a single application (Click), and VM justs processes packets. The requirements of such a system
would be 1. Fast instantiation
2. Small footprint
3. Isolation
4. Performance
5. Flexibility
ClickOS matches the requirements by providing 1. Less than 30 msec boot times
2. 5MB memory required when running
3. Isolation provided by Xen
4. 10Gbps line rate (all cases except for small packet sizes), and 45 microsec
5. Flexibility provided by Click
Optimizations provided by ClickOS to achieve high packet rates are 1. Reuse Xen page permissions (front-end)
2. Introduce a fast switch (80Mp/s), VALE as the backend switch. VALE uses
3. Increase I/O request batch size
ClickOS can be used as the building block for devices like logical switches, logical
routers, and other NFV devices. OpenFlow support is available to provide flow-level
Table 4: Comparison of Different Data-plane management blocks
FasTrack NetVM ClickOS
Support for thousands of flow rate-limiters
Trust between VMs required
Provisioning of Specific applications only?
Line-rate support
Migration supported
Problem Formulation
Requirements for a Network-QoS solution
Following are the requirements for a Network-QoS solution 1. Provide minimum guarantees
• Per-Flow Rate-Limiting to achieve bandwidth guarantees
• Per-Flow delay guarantees
2. High speed packet data i.e. High network resource utilization
3. High speed Inter-VM communication
Some of the use-cases would be NFVs, and tiers of a Multi-tier application,
where there exists chaining of VMs.
Inter-Tenant traffic example: Amazon AWS offers sixteen services that result
in network traffic between tenant VMs and service instances. These services
provide diverse functionality, ranging from storage to load-balancing and monitoring.
4. Performance Isolation
QoS control of one VM should not degrade the performance of other VMs.
Researchers have provided solution for ‘Transmit Bandwidth Guarantees’ through
software. We propose to provide a QoS solution through hardware, using 82576/82599
SRIOV NICs. We intend to provide TX/RX bandwidth guarantees, and also packetdelivery-delay guarantees after packet reception. Researchers have not considered
the RX bandwidth, and high receive traffic of one VM could take up the resources
of other VMs, resulting in throughput reduction.
• For each flow, within a VM ; and for each VM within a single physical machine.
QoS requirement- Expected Transmit/Receive bandwidth values, Expected Delay values between packet reception and delivery
• Physical-Host Details- Available Bandwidth, Percentage CPU available for
Network I/O, SRIOV NIC specifications like number of queues/pools.
Expected Output:
• SRIOV Configuration parameters and their values, such that requirements for
all flows is satisfied.
Solution for transmit bandwidth partitioning [9]
Authors of [9] have specified another mechanism for receive bandwidth partitioning
of 1 Gbps in 82576 controllers. They have provided a configuration method to set
the bandwidth of each VF, by setting RF DEC and RF INT register values. Details
of these registers are available in 82576 datasheet [5] and also shown in figure 10.
After making VFs assignable and before assigning them to VFs, it is possible to
Figure 10: Bandwidth Control of Intel 82576 GbE(Source: [5])
configure bandwidth of each VM using the command–
#echo "600 300 100" > /sys/class/net/eth1/device/bandwidth_allocation
The above command partitions the 1 Gbps NIC bandwidth into 600 Mbps, 300
Mbps, and 100 Mbps for VF1, VF2 and VF3 respectively.
In this report, we have studied the literature of Network Virtualization. We propose
a methodology to achieve Application QoS. We also walk-through the Network
Virtualization stack with the help of an example, to show how Application QoS can
be achieved. We have also identified other possible problem areas in this domain.
We provide analysis of the basic data-plane building blocks of network virtualization
[1] Yaozu Dong, Xiaowei Yang, Xiaoyong Li, Jianhui Li, Kun Tian, and Haibing
Guan. High Performance Network Virtualization with SRIOV. In HPCA, 2010.
[2] Nadav HarEl, Abel Gordon, and Alex Landau. Efficient and Scalable Paravirtual I/O System, a.k.a ELVIS. In ATC, 2013.
[3] Jianglu Chen, Jian Li, and Fei Hu. SR-IOV based Virtual Network Sharing. In
Proceedings of the Second International Conference on Innovative Computing
and Cloud Computing, 2013.
[4] Abel Gordon, Nadav Amit, Nadav HarEl, Muli Ben-Yehuda, Alex Landau,
Assaf Schuster, and Dan Tsafrir. ELI: Bare-Metal Performance for I/O Virtualization. In ASPLOS, 2012.
[5] Intel 82576 SR-IOV Driver Companion Guide, Revision 1.00, June 2009
[6] Intel 82599 SR-IOV Driver Companion Guide, Revision 1.00, May 2010
[7] igb Linux Base Driver for Intel Ethernet Network Connection, July 2013
[8] igbvf Linux Base Driver for Intel Ethernet Network Connection, Jan 14
[9] Jun Kamada, Fujitsu and Simon Horman. Evaluation and improvement of I/
O scalability for Xen. In Xen Summit Asia 2009, Nov. 2009.
[10] Binbin Zhang, Xiaolin Wang, Rongfeng Lai,Liang Yang, Yingwei Luo , Zhenlin
Wang, and Xiaoming Li. A Survey on I/O Virtualization and Optimization. In
Chinagrid, 2010.
[11] Himanshu Raj and Karsten Schwan. High Performance and Scalable I/O Virtualization via Self-Virtualized Devices In HPDC, 2007.
[12] Brad Hedlund, Scott Lowe, and Ivan Pepelnjak. VMware NSX Architecture,
webinar series. In youtube, October 2013.
[13] The VMware NSX Network Virtualization Platform. Technical White Paper.
[14] VMware NSX Network Virtualization Design Guide.
[15] VMware NSX The Platform for Network Virtualization, Datasheet.
[16] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex
Ho, Rolf Neugebauery, Ian Pratt, and Andrew Wareld. Xen and the Art of
Virtualization. In SOSP, 2003.
[17] Intel Corporation. Intel data plane development kit: Getting started guide.
[18] Jinho Hwang, K. K. Ramakrishnan, and Timothy Wood. NetVM: High Performance and Flexible Networking using Virtualization on Commodity Platforms.
[19] Radhika Niranjan Mysore, George Porter, and Amin Vahdat. FasTrak: Enabling Express Lanes in Multi-Tenant Data Centers. In CoNEXT 2013.
[20] Joao Martins, Mohamed Ahmed, Costin Raiciu, Vladimir Olteanu, Michio
Honda, Roberto Bifulco, and Felipe Huici. ClickOS and the Art of Network
Function Virtualization. In USENIX, NSDI 2014.
[21] Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. SENIC: Scalable NIC for End-Host
Rate Limiting. In USENIX, NSDI 2014.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF