On Realizing Application QoS on a Virtualized

On Realizing Application QoS on a Virtualized Network Annual Progress Seminar Report Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Rinku Shah (Roll No. 134053001) Under the guidance of Prof. Umesh Bellur and Prof. Purushottam Kulkarni Department of Computer Science and Engineering Indian Institute of Technology, Bombay August, 2014 Acknowledgements I would like to take this opportunity to thank my advisors, Prof. Umesh Bellur and Prof. Purushottam Kulkarni, for their constant guidance, direction, and valuable feedback. I would also like to thank Prof. Varsha Apte and Prof. Mythili Vutukuru for their support, and feedback. I would like to thank my colleague, Debadatta Mishra for lending me a helping hand. Lastly, I would like to thank my family and friends for their constant love and support. Rinku Shah 2 Abstract Machine virtualization has proved to be beneficial, both to the service-providers and service-users. Service-providers could increase financial gains by using techniques like migration and consolidation of virtual machines, hence requiring smaller number of physical machines. Service-users get the software-view of their machines. They could now easily create, destroy, migrate, clone and snapshot their virtual machines. They now pay only as per their use. After the success of ‘Compute Virtualization’, now the world is interested in ‘Network Virtualization’. People want to have the same features with the networks. If network become a piece of software, it would be easy to create, destroy, migrate, clone and snapshot the network. But with the new bag of features, we should not compromise on performance. In this report, we propose the methodology for realizing Application QoS in a Virtualized Network. We also project the other possible research problem areas for a Software-defined Data-center solution, that aim at increasing resource utilization, and improving application performance. 3 Contents 1 Work Progress 6 1.1 Courses Taken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Seminar Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Network Virtualization 7 2.1 Why do we require to Virtualize Networks? . . . . . . . . . . . . . . 7 2.2 Network Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 The Network Virtualization Stack . . . . . . . . . . . . . . . 8 2.2.2 Areas for Exploration . . . . . . . . . . . . . . . . . . . . . . 13 3 Building Blocks of a Virtualized Network 3.1 3.2 3.3 14 Data-plane building blocks . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 PV-Bridging . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 SR-IOV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.3 Intel’s Data plane development kit, DPDK . . . . . . . . . . 18 Data-plane Management blocks . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 NetVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2 FasTrack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.3 Scalable NIC for end-host Rate limiting (SENIC) . . . . . . . 22 3.2.4 ClickOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 Problem Formulation 25 4.1 Requirements for a Network-QoS solution . . . . . . . . . . . . . . . 25 4.2 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3 Solution for transmit bandwidth partitioning [9] 26 5 Conclusion . . . . . . . . . . . 27 4 List of Figures 1 The Network Virtualization Stack . . . . . . . . . . . . . . . . . . . 8 2 Example Network Topology . . . . . . . . . . . . . . . . . . . . . . . 9 3 Example Network Topology with Performance Metrics . . . . . . . . 10 4 Steps involved for achieving Application-QoS guarantees for a virtualized network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5 Paravirtualized Device Model (Source: [10]) . . . . . . . . . . . . . . 14 6 SR-IOV virtualization architecture (Source [11]) . . . . . . . . . . . 15 7 NetVM System Overview (Source [18]) . . . . . . . . . . . . . . . . . 20 8 FasTrack Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 9 SENIC- Schedule and Pull model(Source [21]) . . . . . . . . . . . . . 22 10 Bandwidth Control of Intel 82576 GbE(Source: [5]) . . . . . . . . . . 26 5 1 Work Progress 1.1 Courses Taken Table 1: Courses Taken (2013-14) Code CS 681 CS 683 CS 699 CSS801 HS 699 TD 610 Total Course Title Performance Analysis of Computer Systems and Network Advanced Computer Architecture Software Lab Seminar Communication and Presentation Skills Contemporary Critical Issues in Technology and Development 1.2 Credits 6.0 6.0 8.0 4.0 4.0 6.0 24.0 Seminar Presentation Network I/O virtualization is a challenging problem to solve. In the seminar report, various reasons for network I/O virtualization overheads are analyzed. The report also provides a detailed classification on Network I/O Virtualization Optimization techniques. The report compares the performance of Paravirtualized, Emulated and Direct I/O model with respect to network I/O access. The report concluded with identification of open challenges like 1. Dynamic Setup for VM Network I/O policies 2. Offload Netfilter Functions 3. Live-migration Challenge in Direct I/O model 4. Scalability Challenge in Direct I/O model 6 Grade AA AB BB AA PP AU CPI= 9.08 2 Network Virtualization Virtual networks, like virtual machines are automatically provisioned and managed, independent of the underlying hardware. There are various network virtualization solutions that offer migration of hardware network components into logical software components. Translating a hardware component into a software object, provides numerous benefits like, creation, destruction, migration, cloning of these softwaretranslated hardware entities. But along with all these benefits, there is a performance trade-off, as the hardware tasks will now be performed by the software. All network virtualization solution providers require to solve the performance problem. 2.1 Why do we require to Virtualize Networks? There are various reasons for Network Virtualization’s popularity [12]. 1. Network Abstraction Network virtualization provides an abstraction of a network, and bundles all network components together, as a software object. It is possible to create, delete, change, migrate, snapshot, and clone the network easily. 2. Quick, On-demand, and Automated Network Provisioning Network virtualization solutions prepare the complete network model in software and can enable any network topology in seconds. It enables a library of logical networking elements and services, such as logical switches, routers, firewalls, load balancers, VPN, and workload security. 3. Workload Placement Network virtualization can provide large virtual segments for workload placement. Without network virtualization, workload placement and mobility is confined to individual physical subnets. Network operators require to manually configure the VLANs, ACLs, firewall rules, etc on the hardware boxes. This is a slow process, and also it has configuration limits, for example, maximum number of VLANs that could be configured are 4096. 4. Live Mobility With Network Virtualization, it is possible to live migrate the entire network, or even a VM of the network without loss of layer-2 connectivity. 5. Elasticity and Scalability The kind of technology choices offered by Network Virtualization solutions will not limit the number of tenants. We can dynamically add or remove components and services in the existing network. 7 All these features favour the evolution of virtualized network systems. It is also necessary to have atleast, an equivalent performance support by the virtualized network systems. Our research intends to analyze and provide solutions for a performanceaware virtualized network system. 2.2 Network Virtualization 2.2.1 The Network Virtualization Stack IDS VMware vCLOUD ... LB Firewall ... Multi-tier architecture ... CloudStack OpenStack Applications Cloud Management Platform (CMP) API Network QoS, ACL, Firewall,... Rule manager VM Placement & Migration Management Inter-VM Flow Management Phy/Log object Failure Mgmt ... Network QoS Management Physical & Logical Network Management Plane Actuators Tunnel creation at VMs & Phy switch VM Placement and Migration control ... Routing control for internal logical nets ... Routing control for outside world FasTrack NetVM PV SRIOV DPDK OVS Service nodes Gateways Third-party switches Hypervisor hosts Physical & Logical Network Control Plane ... ... OpenFlow ... Data Plane Building Blocks ... Figure 1: The Network Virtualization Stack 8 Data Plane Management Physical layer The typical layers of any Software-defined Data-center solution are shown in Figure 1. A typical Network Virtualization Solution provider divides its architecture into three layers; Physical, Virtual and Cloud. The Physical Layer consists of hypervisor hosts, and appliances like service nodes, and gateways. Gateways provide connectivity to the outside world. Service nodes provide centralized functionality, like tunneling for unicast and multicast traffic. A typical Virtual layer has the following logical planes; Data-plane, Dataplane management, Control plane, and Management plane. A typical cloud layer comprises of solutions that provide cloud services, i.e. Cloud Management Platform (CMP).The Cloud Management Platform (CMP) is the consumer of this Network Virtualization system. The CMP inputs the networktopology through a sequence of API calls. The Application layer comprises of multitiered, single-tiered applications, or NFV applications like, Firewalls, Intrusion detection system, and Load balancers. We take an example of a multi-tiered application to drive through the network virtualization stack. It comprises of a web-tier, an application-tier, and a databasetier. During this traversal, we propose our methodology for providing Application QoS guarantees for the virtualized network. Current systems consider the application traffic classes alone, to guarantee application QoS. Web server App server 2 App server 1 DB server Figure 2: Example Network Topology Consider the example network topology of a multi-tiered system as shown in Figure 2. To guarantee application QoS, we need to acquire the performance metrics between each node within the topology. Our assumption is that the performance metrics are given by the customer of this multi-tiered system. In our example, HTTP requests arrive from the outside world to the Web Server, we have named the outside world entity as ‘External’. Web Server decodes the request type, and accordingly forward the request to either, Application Server 1 or Application Server 2. Application Servers in-turn may query the Database servers. We have taken this scenario to give an example for a case, where some tasks have higher priority than the others; i.e; Database server should have different processing delay for flows coming from Application Server 1 and Application Server 2. For example, in a banking application, one App-Server is responsible for handling 9 App server 1 <B_A1D, B_DA1, D_DA1> <B_WA1, B_A1W> <B_EW, B_WE> Web server External DB server <B_WA2, B_A2W> <B_A2D, B_DA2, D_DA2> App server 2 Figure 3: Example Network Topology with Performance Metrics real-time operations like ‘Withdrawal’, whereas the other App-Server is responsible for handling non-real-time operations like cheque-book request processing. It is understood that the ‘Withdrawal’ operation should be given higher priority, and processed faster. The example in Figure 3 provides the performance metric values between the communicating nodes. Bij is the convention used to represent the bandwidth requirement between nodes ‘i’ and ‘j’. Similarly, P rocDij is the convention used to represent the maximum processing delay that could be tolerated between nodes ‘i’ and ‘j’. We use ‘E’ to represent nodes external to the data-center. ‘W’, ‘A1’, ‘A2’, and ‘D’ represents the web server, application server 1, application server 2, and the database server respectively. The application QoS information can be fed as input to the data-center manager node through an API, in a tabular form as shown in Tables 2 and 3. E W A1 A2 D Table 2: Bandwidth guarantees requested for the sample topology E W A1 A2 D – BEW – – – BW E – BW A1 BW A2 – – BA1W – – BA1D – BA2W – – BA2D – – BDA1 BDA2 – 10 Table 3: Processing Delay guarantees requested for the sample topology E W A1 A2 D E – – – – – W – – – – – A1 – – – – – A2 – – – – – D – – P rocDDA1 P rocDDA2 – Customer 4 5 1 CMP 2 3* Policy Manager Data-center Manager 3 Figure 4: Steps involved for achieving Application-QoS guarantees for a virtualized network Following are the sequence of steps involved in achieving Application QoS guarantees as shown in Figure 4. 1. The customer provides three inputs to the Cloud Management Platform(CMP); the Network-Topology (an adjacency matrix), Requested Bandwidth guarantee matrix, and Requested Processing-Delay guarantee matrix. 2. The CMP generates a sequence of API calls to the Data-center manager for the requested topology. 3. A Manager node resides in the Management plane. It is a single point of configuration through which the virtual network requirements are fed in to the Software Defined Data Center. Any CMP can communicate to the Manager using some API. This plane exposes a rich set of API libraries to the outside world. 11 The manager-node always has the complete picture of the data-center. It takes in the requirement in the form of a network topology, and some performance constraints. It creates a set of configuration policies to satisfy the performance requirements. It also takes the placement decesion considering the available resources and requested QoS. There are a number of other management tasks like, failure recovery, migration, etc. that are taken care by this layer. The Data-center manager node receives the API calls from the CMP. It has the complete view of the data-center; i.e.; it is aware of the resource availability , and capability of each physical entity in the data-center. Capability of the device means the kind of data-plane building-block that is available on the device. Using the capability and availability information, the policy manager could look up into the Bandwidth, and Delay requirements; and specifies the data-plane building block along with its configuration details to be used to satisfy the required QoS. For each logical object (node); manager provides• Data-plane Building block to be used. • Set of configurations for data-plane management to satisfy the requested QoS metric values. • Physical host is chosen on which the object is to be placed. We would use placement algorithms provided by manager, and do not intend to modify them. For the entire network topology, Layer 2 connectivity configuration is also generated by the manager. It is now the job of the control-plane to implement the set of configurations delivered by the manager. Control plane A typical controller is responsible for executing control plane requirements like creation of tunnels between VM nodes for layer2 connectivity, Network QoS policy control, Controlling Firewall rules, ACLs, etc. The controller nodes are deployed as a cluster, to ensure high availability and scalability. Controller cluster configures all the soft-switches with the configuration provided by the manager node. Control-plane nodes act as actuators to implement the configuration, and policies specified by the management modules. Control plane modules also need to ensure that, the routing paths and configurations are correctly disseminated for internal as well as external paths. To achieve this, it needs to virtualize the routing protocols like IS-IS. Data plane & Data-plane management A typical Data-plane consists of physical layer abstraction of devices like switches, and routers. It also enables access-level switching. Virtual network 12 segments are implemented between soft-switches using MAC-over-IP encapsulation. To support overlay networking; GRE, STT, or VXLAN encapsulation can be used. Example of data-plane components are vSphere Distributed Switch (VDS) for vSphere, or Open vSwitch for non-ESX hypervisors. The data-plane management components manage the data-plane components. They configure data-plane components to implement tasks like Network-QoS satisfaction,and implementing layer-2 segment connectivity for the VMs. Example of data-plane management components are solutions like OpenFlow, NetVM, and FasTrack. 3 * There could be a case when enough resources are not available to satisfy the requested QoS values. Under such circumstances, there is a feedback generated by the data-center manager to the CMP, requesting for the changes in QoS metric values. 4. The CMP passes on the request to the customer. 5. The customer provides the detailed network-topology with modified QoS metric values, and proceeds to step 2. 2.2.2 Areas for Exploration Application Composition/Decomposition In certain applications, input-data needs to traverse through mutiple logical stages. For example, for NFV systems like Intrusion Detection System(IDS), data packets may flow from one module to another, resulting into module chains. If we could decompose the IDS virtual object into multiple logical objects, it would be easier to scale the system. Figuring out the appropriate decomposition of an application into modules, would help in better resource utilization, and elasticity. Also, keeping related set of modules together would improve performance. Figuring out an appropriate application-composition and application-decomposition is one of the possible problem area. Performance vs. Scale Tradeoff Application Composition/Decomposition would help in giving a flat design. It would help in higher resource utilization, and also ensure scalability. But as the chain of modules start growing, the end-to-end delays start increasing. Figuring out ‘Just-Right’ application composition/decomposition configuration, such that required performance criteria is met, and application could scales well too, is a possible problem area. 13 3 Building Blocks of a Virtualized Network We have analyzed some of the data-plane building blocks, and derived methods for configuring the data-plane objects through the data-plane management objects, so as to achieve required QoS metric values. We intend to figure out all such building blocks and perform the same analysis. The following sections detail out the working, and configuration parameters/methods for each building block. 3.1 Data-plane building blocks 3.1.1 PV-Bridging Figure 5: Paravirtualized Device Model (Source: [10]) In paravirtualized I/O model [10], device drivers are split into two parts, viz. , Front-end and Back-end driver. Front-end driver is installed in guest VM, whereas Back-end driver is installed in a privileged VM (a.k.a Driver Domain/Domain 0). Guest OS needs to be modified to achieve this. When I/O operation is requested by the guest application, front-end driver forwards the requests to the back-end driver. Back-end driver decodes the requests, maps it to the hardware, and directs the device to complete execution. Back-end driver can help in managing resource-control as well as many optimizations like batching guest requests. Several optimizations are implemented to improve its efficiency. A packet transmission takes place according to the following steps: 1. Packet copy/remap from VMs front-end driver to Driver domains back-end driver 2. Route the packet through ethernet bridge to physical NIC’s driver 3. Enqueue packet for transmission on network interface A packet reception takes place according to the following steps: 1. NIC generates interrupt; and is captured by hypervisor 14 2. Captured interrupt is routed from hypervisor to NIC’s device driver in driver domain as virtual interrupt 3. NIC’s device driver transfers packet to eth bridge 4. Bridge routes packet to appropriate back-end driver 5. Back-end driver copies/remaps packet to front-end driver in target VM 6. Back-end driver requests the hypervisor to send virtual interrupt to front-end driver in target VM 7. Hypervisor sends virtual interrupt to front-end driver of target VM 8. Front end driver delivers packet to VM’s network stack Configuration As discussed above, a paravirtualized I/O model has central control over the network resources, because all incoming/outgoing packets have to go through the hypervisor. It is possible to specify network-QoS policies at the hypervisor. We could also have bandwidth partitioning performed at the VM’s back-end driver. The problem with this model is that, it consumes very high CPU to achieve line rates. 3.1.2 SR-IOV Figure 6: SR-IOV virtualization architecture (Source [11]) Single Root - I/O Virtualization and sharing (SR-IOV) [11] is fast I/O virtualization standard in PCI Express devices. SR-IOV is a PCI-SIG specification that 15 defines a standard for creating natively shared devices. In SR-IOV, packet multiplexing, address translation, and memory protection is performed by the hardware. To perform address translation and memory protection securely, SR-IOV uses Intel VT-d technology, which provides hardware functioning of IOMMU. Hypervisor is completely eliminated from the latency-sensitive I/O path. CPU is no longer involved in copying data to, and from the VM. SR-IOV capable device is a PCIe device, that can create multiple virtual functions (VFs). A PCIe device is a collection of one or more functions. An SR-IOV capable device has one or more Physical functions(PFs); and each PF is a standard PCIe function associated with multiple Virtual functions (VFs). Each VF acts as a light-weight PCIe function, that is configured and managed by PFs. SR-IOV comprises of three components; PF driver, VF driver, and SR-IOV manager (IOVM). The PF driver has access to all PF resources. It manages and configures VFs. At startup, it sets number of VFs, enables/disables VFs, sets device configurations like MAC address and VLAN settings for a NIC, configures layer 2 switching. The VF driver executes on the guest OS, and can access its VF directly (without VMM involvement). VF needs to duplicate resources such as DMA descriptors, that are performance-critical; whereas other resources are emulated by IOVM and PF driver. IOVM provides a virtual full configuration view of each VF to the guest OS, so that the guest configure VF as a physical device. IOVM helps dynamic addition of VFs to the host, which are then assigned to the guest OS. Once guest discovers the assigned VF, it can initialize and configure it as any physical device. PF and VF driver communicate with each other using hardware-based producer/consumer technique. Producer writes a message into the mailbox, and rings the doorbell. The consumer consumes the message, and notifies the producer by setting a bit in shared register. SR-IOV achieves high throughput (native), and is also scalable. SRIOV consumes higher CPU than Native I/O, but much lower than Paravirtualized I/O model. This is due to VMM intervention for guest interrupt delivery. VMM captures the physical interrupt from VF, maps to guest virtual interrupt, and injects it. VMM needs to emulate virtual Local APIC for HVM guest and event channel for PVM guest. Authors of [4], [2] have implemented a design such that guest interrupt delivery is eliminated. Another problem with SR-IOV is that, it is extremely difficult to replicate the hardware state of NIC due to high frequency and non-deterministic nature of incoming packets. Most intuitive solution to this problem is; dynamic switching between direct accessed VF at run-time, and emulated VF at migration time. 16 Configuration Intel 82576[5] and Intel 82599[6] provide a rich set of configurable parameters in the hypervisor user-space. These parameters can be utilized for VM network management, and for realizing Network QoS. Interrupt Throttle Rate is configuration parameter for both PF and VF, whereas rest parameters are for PF alone. Following are the details of configurable parameters: 1. Interrupt Throttle Rate This parameter would allow the specified number of interrupts to be generated for incoming traffic(kind of limit on RX queue size). Range- 0, 1, 3, 100-100000 Default= 3 0-off 1-dynamic 3-Dynamic Conservative 2. LLI (Low Latency interrupts) (a) LLI port This parameter allows immediate interrupt generation for a packet received on a specific port. e.g. To reduce latency for TCP/RTP packets (b) LLI size This parameter allows immediate interrupt generation for a packet smaller than the specified size. (c) LLI push LLIPush can be set to be enabled or disabled (default). It is most effective in an environment with many small transactions. Valid Range: 0-1 Default Value: 0 (disabled) 3. RSS (Receive Side Scaling) Packets are routed to number of queues as specified. Each queue is processed by a different processor. (Increasing number of queues might lead to reduced performance as cache hits might reduce) Valid Range: 0-8 Maximum supported number of queues 4. VMDq This parameter is used to specify the number of queues to be added for each pool(This parameter would be useful for 10G card as it can have variety of queue configurations per pool). 17 Valid Range: 0-4 on 82575-based adapters; 0-8 for 82576/82580-based adapters Default Value: 0 Supports enabling VMDq pools as this is needed to support SR-IOV. 0 = Disabled 1 = Sets the netdev as pool 0 2+ = Add additional queues but they currently are not used This parameter is forced to 1 or more if the max vfs module parameter is used. The number of queues available for RSS is limited if this is set to 1 or greater. 5. DMAC DMA coalescing is the parameter to enable/disable packet coalescing. Valid Range: 0, 250, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000 Default Value: 0 (disabled) This parameter enables or disables DMA Coalescing feature. Values are in microseconds and increase the internal DMA Coalescing internal timer. DMA (Direct Memory Access) allows the network device to move packet data directly to the system’s memory, reducing CPU utilization. However, the frequency and random intervals at which packets arrive do not allow the system to enter a lower power state. DMA Coalescing allows the adapter to collect packets before it initiates a DMA event. This may increase network latency but also increases the chances that the system will enter a lower power state. Network policies could be satisfied by configuring one or more of the above listed parameters. 3.1.3 Intel’s Data plane development kit, DPDK “DPDK [17] is the set of optimized user-space software libraries and drivers, that can be used to accelerate packet processing in Intel architecture“ DPDK achieves high performance because 1. With Paravirtualized I/O model, there was an additional packet copy required for packet send/receive, because of the split-driver model. SRIOV eliminates the additional packet copy, by directly assigning virtual functions to VMs. But the packet still requires to be copied from kernel space to user space. DPDK maps hardware registers into user-space. DPDK does not require packet copy from kernel-space to user-space. If DPDK is implemented over the SRIOV hardware, we can achieve true zero-copy. 2. DPDK provides the configurable feature of assigning hardware queues to soft- 18 ware flows, and these queues could be assigned to a specific CPU. DPDK agressively isolate CPU’s by assigning set of software flows to a hardware queue on a single CPU. This helps in directing all flows of a single application to arrive on the same CPU, resulting in better cache hit-ratio, hence better performance. This configuration could also be used to restrict the number of CPUs assigned for handling specific network flows, resulting in bandwidth capping. 3. DPDK provides supports batch packet processing. This configuration could help in improving the throughput of the system. 4. DPDK uses huge memory page sharing. The intention of this feature was to have high speed Inter-VM communication. The packets of the source VM are placed in the shared memory, destination VM directly picks it up, without any involvement of the hypervisor. 5. DPDK uses SIMD extensions of Intel hardware, and does some vectorization. This speeds up parallelizable tasks and improves throughput. 6. DPDK provides use of lockless queues. Most of the times throughput is low, because the transmit/receive descriptor queues are locked by some process, and other processes are waiting. Elimination of locks, along with handling of race condition challenges would improve throughput. Configuration The Intel DPDK vSwitch application-specific options are detailed below [17]: --client_switching_core It specifies CPU ID of the core on which the main switching loop will run -n NUM Specifies the number of supported clients -p PORTMASK Specifies Hexadecimal bitmask representing the ports to be configured, where each bit represents a port ID. e.g. for a portmask of 0x3, ports 0 and 1 are configured -v NUM Number of virtual Ethernet devices to configure. The maximum number of virtual devices currently supported is eight (8). 19 --config (port,queue,lcore)[,(port,queue,lcore] Each port/queue/core group specifies the CPU ID of the core, that will handle ingress traffic for the specified queue on the specified port DPDK also provides Dynamic Flow Manipulation using ovs-dpctl. Few commands supported by ovs-dpctl are add-flow, del-flow, mod-flow, and get-flow. These commands are used to dynamically add flows to a switch, delete flows, modify actions on the flows, get actions set on a particular flow, respectively. Details on these commands can be obtained from [17]. 3.2 3.2.1 Data-plane Management blocks NetVM Software-defined data centers virtualize existing network functions like Routers, Firewalls, and Load Balancers. NetVM [18] is motivated by NFV’s requirement of high-speed and low latency. Authors of [18] provide 1. Virtualization-based high-speed packet delivery (line-speed) 2. Memory sharing framework for network data 3. Hypervisor-based switching 4. High-speed Inter-VM communication NetVM makes use of Intel Data plane development kit (DPDK) [17] with some tuning, to achieve high performance. Figure 7 shows the working of NetVM. Figure 7: NetVM System Overview (Source [18]) 1. NetVM runs in hypervisor user-space 20 2. Memory is shared by all trusted VMs, to enable zero-packet copy 3. Each VM has its own ring to RX/TX packet descriptor The NetVM solution has constraints of having trusted VMs, which may not always be the case. Configuration NetVM uses DPDK as the building block for software flow configurations. It provides NetLib which is an interface between PCI device and user applications. User can read/write packets from user-space using this interface, and perform actions like discard, forward to other VM, send it out to the NIC. 3.2.2 FasTrack FasTrack [19] is motivated by the requirement of providing thousands of networkrules, policies, and traffic rate-limits, to achieve Tenant isolation and QoS. It exploits the temporal locality in flows and flow sizes, and offloads the subset of network-rules, from the hypervisor to the switch hardware. It dynamically migrates these rules from the hypervisor to the switch, and vice-versa. Traffic Flows SDN Controller PV-Vif SRIOV-VF Figure 8: FasTrack Overview Configuration FasTrack design comprises of a SDN controller that decides, which subset of flows should be offloaded. Figure 8 shows that the SDN controller directs the flows selected by Per-VM flow placement module through Hypervisor PV-Vif-driver, and 21 others through SRIOV-VF driver. Flow placement module integrates with an OpenFlow interface alowing the controller to program it. These kind of configurations can be used to colocate two VMs, where one requires deterministic bandwidth(SRIOV), and other is fine with best-effort service(PV). 3.2.3 Scalable NIC for end-host Rate limiting (SENIC) Figure 9: SENIC- Schedule and Pull model(Source [21]) SENIC [21] is an end-host rate-limiting technique. It has the following features1. Provides scalability for more than 1000’s of traffic classes 2. Works at high link speeds 3. Has low CPU overhead 4. Provides accurate and precise rate-limiting 5. Supports hypervisor bypass Figure 9 shows the working of SENIC. 1. OS notifies NIC about the packet. Schedule per-class queues that are stored in host RAM 2. DMA packet from host memory to NIC. late-binding of packet to NIC eliminates the requirement of expensive SRAM buffers. 3. Transmit the packet Configuration SENIC supports 10s of thousands of rate limiters, with accurate and precise ratelimiting. We could provide QoS guarantees for those many flows on a single physical host. But SENIC requires hardware modifications. This building block would be considered if some hardware device like SRIOV implements the modifications. 22 3.2.4 ClickOS ClickOS [20] is motivated by the idea of shifting the middlebox processing on the software. ClickOS is a tiny Xen-based VM that runs Click. It has single address space, runs on a single core, uses non-preemptive scheduler, executes a single application (Click), and VM justs processes packets. The requirements of such a system would be 1. Fast instantiation 2. Small footprint 3. Isolation 4. Performance 5. Flexibility ClickOS matches the requirements by providing 1. Less than 30 msec boot times 2. 5MB memory required when running 3. Isolation provided by Xen 4. 10Gbps line rate (all cases except for small packet sizes), and 45 microsec delay. 5. Flexibility provided by Click Optimizations provided by ClickOS to achieve high packet rates are 1. Reuse Xen page permissions (front-end) 2. Introduce a fast switch (80Mp/s), VALE as the backend switch. VALE uses netmap 3. Increase I/O request batch size Configuration ClickOS can be used as the building block for devices like logical switches, logical routers, and other NFV devices. OpenFlow support is available to provide flow-level configuration. 23 3.3 Summary Table 4: Comparison of Different Data-plane management blocks FasTrack NetVM ClickOS Support for thousands of flow rate-limiters ✓ ✓ ✗ Trust between VMs required ✗ ✓ ✗ Provisioning of Specific applications only? ✗ ✗ ✓ Line-rate support ✗ ✓ ✗ Migration supported ✗ ✗ ✓ 24 SENIC ✓ ✗ ✗ ✓ ✓ 4 Problem Formulation 4.1 Requirements for a Network-QoS solution Following are the requirements for a Network-QoS solution 1. Provide minimum guarantees • Per-Flow Rate-Limiting to achieve bandwidth guarantees • Per-Flow delay guarantees 2. High speed packet data i.e. High network resource utilization 3. High speed Inter-VM communication Some of the use-cases would be NFVs, and tiers of a Multi-tier application, where there exists chaining of VMs. Inter-Tenant traffic example: Amazon AWS offers sixteen services that result in network traffic between tenant VMs and service instances. These services provide diverse functionality, ranging from storage to load-balancing and monitoring. 4. Performance Isolation QoS control of one VM should not degrade the performance of other VMs. 4.2 Hypothesis Researchers have provided solution for ‘Transmit Bandwidth Guarantees’ through software. We propose to provide a QoS solution through hardware, using 82576/82599 SRIOV NICs. We intend to provide TX/RX bandwidth guarantees, and also packetdelivery-delay guarantees after packet reception. Researchers have not considered the RX bandwidth, and high receive traffic of one VM could take up the resources of other VMs, resulting in throughput reduction. Input: • For each flow, within a VM ; and for each VM within a single physical machine. QoS requirement- Expected Transmit/Receive bandwidth values, Expected Delay values between packet reception and delivery • Physical-Host Details- Available Bandwidth, Percentage CPU available for Network I/O, SRIOV NIC specifications like number of queues/pools. Expected Output: • SRIOV Configuration parameters and their values, such that requirements for all flows is satisfied. 25 4.3 Solution for transmit bandwidth partitioning [9] Authors of [9] have specified another mechanism for receive bandwidth partitioning of 1 Gbps in 82576 controllers. They have provided a configuration method to set the bandwidth of each VF, by setting RF DEC and RF INT register values. Details of these registers are available in 82576 datasheet [5] and also shown in figure 10. After making VFs assignable and before assigning them to VFs, it is possible to Figure 10: Bandwidth Control of Intel 82576 GbE(Source: [5]) configure bandwidth of each VM using the command– #echo "600 300 100" > /sys/class/net/eth1/device/bandwidth_allocation The above command partitions the 1 Gbps NIC bandwidth into 600 Mbps, 300 Mbps, and 100 Mbps for VF1, VF2 and VF3 respectively. 26 5 Conclusion In this report, we have studied the literature of Network Virtualization. We propose a methodology to achieve Application QoS. We also walk-through the Network Virtualization stack with the help of an example, to show how Application QoS can be achieved. We have also identified other possible problem areas in this domain. We provide analysis of the basic data-plane building blocks of network virtualization system. 27 References [1] Yaozu Dong, Xiaowei Yang, Xiaoyong Li, Jianhui Li, Kun Tian, and Haibing Guan. High Performance Network Virtualization with SRIOV. In HPCA, 2010. [2] Nadav HarEl, Abel Gordon, and Alex Landau. Efficient and Scalable Paravirtual I/O System, a.k.a ELVIS. In ATC, 2013. [3] Jianglu Chen, Jian Li, and Fei Hu. SR-IOV based Virtual Network Sharing. In Proceedings of the Second International Conference on Innovative Computing and Cloud Computing, 2013. [4] Abel Gordon, Nadav Amit, Nadav HarEl, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. ELI: Bare-Metal Performance for I/O Virtualization. In ASPLOS, 2012. [5] Intel 82576 SR-IOV Driver Companion Guide, Revision 1.00, June 2009 [6] Intel 82599 SR-IOV Driver Companion Guide, Revision 1.00, May 2010 [7] igb Linux Base Driver for Intel Ethernet Network Connection, July 2013 [8] igbvf Linux Base Driver for Intel Ethernet Network Connection, Jan 14 [9] Jun Kamada, Fujitsu and Simon Horman. Evaluation and improvement of I/ O scalability for Xen. In Xen Summit Asia 2009, Nov. 2009. [10] Binbin Zhang, Xiaolin Wang, Rongfeng Lai,Liang Yang, Yingwei Luo , Zhenlin Wang, and Xiaoming Li. A Survey on I/O Virtualization and Optimization. In Chinagrid, 2010. [11] Himanshu Raj and Karsten Schwan. High Performance and Scalable I/O Virtualization via Self-Virtualized Devices In HPDC, 2007. [12] Brad Hedlund, Scott Lowe, and Ivan Pepelnjak. VMware NSX Architecture, webinar series. In youtube, October 2013. [13] The VMware NSX Network Virtualization Platform. Technical White Paper. [14] VMware NSX Network Virtualization Design Guide. [15] VMware NSX The Platform for Network Virtualization, Datasheet. [16] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauery, Ian Pratt, and Andrew Wareld. Xen and the Art of Virtualization. In SOSP, 2003. [17] Intel Corporation. Intel data plane development kit: Getting started guide. 2014. [18] Jinho Hwang, K. K. Ramakrishnan, and Timothy Wood. NetVM: High Performance and Flexible Networking using Virtualization on Commodity Platforms. In USENIX, NSDI 2014. 28 [19] Radhika Niranjan Mysore, George Porter, and Amin Vahdat. FasTrak: Enabling Express Lanes in Multi-Tenant Data Centers. In CoNEXT 2013. [20] Joao Martins, Mohamed Ahmed, Costin Raiciu, Vladimir Olteanu, Michio Honda, Roberto Bifulco, and Felipe Huici. ClickOS and the Art of Network Function Virtualization. In USENIX, NSDI 2014. [21] Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. SENIC: Scalable NIC for End-Host Rate Limiting. In USENIX, NSDI 2014. 29
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Related manuals
Download PDF
advertisement