A Survey of Network Isolation Solutions for Multi-Tenant Data Centers Valentin Del Piccolo, Ahmed Amamou, Kamel Haddadou, Guy Pujolle To cite this version: Valentin Del Piccolo, Ahmed Amamou, Kamel Haddadou, Guy Pujolle. A Survey of Network Isolation Solutions for Multi-Tenant Data Centers. IEEE Communications Surveys & Tutorials, IEEE, 2016, 18 (4), pp.2787 - 2821. <10.1109/COMST.2016.2556979>. <hal-01430684> HAL Id: hal-01430684 http://hal.upmc.fr/hal-01430684 Submitted on 10 Jan 2017 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. A Survey of network isolation solutions for multi-tenant data centers Valentin Del Piccolo∗, Ahmed Amamou† , Kamel Haddadou†, and Guy Pujolle‡ Abstract are expected to increase their total traffic by 57% in Cumulative Annual GRowth (CAGR) between 2013 (2 Exabytes) and 2018 (19 Exabytes). In addition to a growing use of data centers made by consumers (individuals), the use of data centers made by corporations will also increase. As stated in , in 2013 global data center traffic reached 3.1 zettabytes for the year and is expected to grow to 8.6 zettabytes in 2018, representing a 3-fold increase. However, it is important to make a distinction between two data center types. The first type of data center is the traditional one, which possesses specialized servers. On the contrary, the second type of data center is the cloud data center, which possesses non-specialized servers. These different data center types will not see the same increase in traffic. The traffic from traditional data centers will "only" increase by 8% CAGR between 2013 and 2018, while cloud data center traffic will see an increase of 32% CAGR during the same period , as predicted in . In other words, in 2013 cloud data center workloads represented 54% of total data center workloads, and in 2018 it will represent 76% of the total data center workloads. We can therefore see a shift in favor of cloud data center. This can be explained by one major advantage of a cloud data center over a traditional data center. A cloud data center is more prone to virtualization than a traditional data center. Indeed, cloud data centers are data centers with virtualized devices. With hardware improvement of data center nodes, it is possible to run several virtual machines (VMs) on one physical node. Using several VMs on a physical node allows using it to the fullest of its capabilities. It therefore spends less time in an idle state and wastes less energy. Consequently, it is therefore cost-effective for both the infrastructure provider and their customers. The infrastructure provider, owner of the data center, has nodes actively processing the data of its clients instead of being held in an idle state. This increases the load time of the nodes, which in turn increases their cost-effectiveness. The infrastructure provider therefore needs fewer physical devices for a fixed number of clients. Instead of having multiple physical devices for one client, the provider has multiple VMs for this client. All these VMs can be on one physical device (Figure 1). Virtualization also adds functionalities such as : The Infrastructure-as-a-Service (IaaS) model is one of the fastest growing opportunities for cloud-based service providers. It provides an environment that reduces operating and capital expenses while increasing agility and reliability of critical information systems. In this multitenancy environment, cloud-based service providers are challenged with providing a secure isolation service combining different vertical segments, such as financial or public services, while nevertheless meeting industry standards and legal compliance requirements within their data centers. In order to achieve this, new solutions are being designed and proposed to provide traffic isolation for a large numbers of tenants and their resulting traffic volumes. This paper highlights key challenges that cloudbased service providers might encounter while providing multi-tenant environments. It also succinctly describes some key solutions for providing simultaneous tenant and network isolation, as well as highlights their respective advantages and disadvantages. We begin with Generic Routing Encapsulation (GRE) introduced in 1994 in "RFC 1701", and will conclude with today’s latest solutions. We detail fifteen of the newest architectures and then compare their complexities, the overhead they induce, their VM migration abilities, their resilience, their scalability, and their multi data center capacities. This paper is intended for, but not limited to, cloud-based service providers who want to deploy the most appropriate isolation solution for their needs, taking into consideration their existing network infrastructure. This survey provides details and comparisons of various proposals while also highlighting possible guidelines for future research on issues pertaining to the design of new network isolation architectures. 1 Introduction Data centers are being increasingly used by both corporations and individuals. For example, in Cisco’s forecast , personal content locker services, like Amazon Cloud Drive, Microsoft SkyDrive, and Google Drive, ∗ Valentin Del Piccolo is with the Research and Development Department of GANDI SAS, Paris, France, and a Phd student at the University Pierre et Marie Curie (UPMC), Paris, France. e-mail: firstname.lastname@example.org † Dr Ahmed Amamou and Dr Kamel Haddadou are with the Research and Development Department of GANDI SAS, Paris, France, e-mail: email@example.com, firstname.lastname@example.org ‡ Pr Guy Pujolle is a Professor at the University Pierre et Marie Curie, Paris, France. e-mail: Guy.Pujolle@lip6.fr • Remote OS install • Access to server console • Reboot of frozen server. 1 2.1 In Cisco Virtual Multi-Tenant Data Center 2.0  a tenant has two definitions. In the private cloud model a tenant is defined as "a department or business unit, such as engineering or human resources". In the public cloud model a tenant is "an individual consumer, an organization within an enterprise, or an enterprise subscribing to the public cloud services". In version 2.2 of Cisco Virtual Multi-Tenant Data Center  the difference between public or private cloud has been removed and a tenant is "an user community with some level of shared affinity". To explain this definition, examples are provided in which a tenant may be a business unit, department, or work group. Juniper gives a different definition, they state in their white paper  that "a cloud service tenant share a resource with a community". In order to express this definition more clearly, the example of building tenants is given. In this metaphor, a building tenant has to share the building’s infrastructure just like a cloud service tenant. However a tenant can also have tenants such as stated in . The given example is the case of Second Life which is a tenant of Amazon Web Services and which has tenants of its own, who could also have tenants and so on. Wider than Cisco’s definition of a tenant, Juniper defines a tenant as a cloud service user. This user can be a person or a company or, as in Cisco’s definition, a business unit from a company. In this paper we use the term tenant as defined by Juniper. Figure 1: Two VMs on one node • Guest OS choice. • Possibility of server snapshots for backups. • Hardware upgrade without shutting down the VM. • Possibility of VM migration on a newer server with the backup image of the VM. However, sharing a physical node among several clients implies that there is no device or data isolation. Nevertheless, clients do not want their data exposed to other clients, who might even be competitors. In order to solve this problem, it is necessary to deploy techniques that will provide client isolation. This results in clients only seeing VMs and traffic that they own, and make them believe that they are alone on the network. The remainder of the survey is organized as follows. After quickly reviewing terminology and definitions that pertain to multitenancy isolation in the cloud (Section 2) and explaining tunneling and virtual network notions (Section 3), we detail in Section 4 some network isolation solutions developed before the cloud era in order to show why new solutions are needed for cloud data center with multi-tenant issues. In Section 5 we present those new solutions which provide multi-tenant isolation in cloud data centers. Then we focus on fifteen solutions that provide tenants traffic isolation as follows: The Locator/Identifier Separation Protocol (LISP) , Network Virtualization using Generic Routing Encapsulation (NVGRE) , Stateless Transport Tunneling Protocol (STT) , 802.1ad or QinQ , 802.1ah or mac-in-mac , Virtual eXtensible Local Area Network (VXLAN)  Diverter , Portland , Secure Elastic Cloud Computing (SEC2) , BlueShield , VSITE , NetLord , Virtual Network over TRILL (VNT) , VL2 , Distributed Overlay Virtual nEtwork (DOVE) [16, 17]. We compare them using six criteria in Section 7. We then discuss the future of tenant isolation (Section 8) and, finally, present our conclusions (Section 9). 2 Tenant 2.2 Multitenancy For Cisco , "virtualized multi-tenancy" is a key concept which refers to "the logical isolation of shared virtual compute, storage, and network resources". In continuation with the building metaphor, most of the time there is not only one tenant in a building. Therefore the building is a multitenancy environment. Each tenant wants privacy so they are isolated in apartments. This metaphor is well presented in "The Force.com Multitenant Architecture"  and is reproduced below : "Multitenancy is the fundamental technology that clouds use to share IT resources costefficiently and securely. Just like in an apartment building - in which many tenants costefficiently share the common infrastructure of the building but have walls and doors that give them privacy from other tenants - a cloud uses multitenancy technology to share IT resources securely among multiple applications and tenants (businesses, organizations, etc.) that use the cloud." Terminology For Juniper , multitenancy is the idea of many tenants sharing resources. It is also a key element for In this section we define both terms Tenant and Mul- cloud computing. However multitenancy also depends titenancy. To do so, we use the definitions given in on the service provided. For example, in an IaaS envi[18, 19, 20, 21]. ronment, the provider provides infrastructure resources 2 and that some links must transit data from both overlay networks. Each overlay network must be isolated when sharing the same infrastructure. Tunneling is therefore mandatory to keep data from exiting a virtual network. like hardware and data storage to the tenants who in turn must share them. In a SaaS environment, tenants use the same applications, so there is a chance that their data is stored in a single database by the service provider. There are security constraints to apply at each layer. In this survey we focus on data center architectures providing tenants’ traffic isolation. 3 3.2 Background This section explains the notions of virtual networks (Section 3.1) and tunneling (Section 3.2). We also detail the relation between multitenancy, virtual networks, and tunneling in Section 3.3. 3.1 Tunneling Figure 3: Tunnel Virtual network Figure 3 shows the concept of tunneling the data through a tunnel from one point to another. Data using this tunnel is isolated from the rest of the network. In  Cisco defines Tunneling as "a technique that enables remote access users to connect to a variety of network resources (Corporate Home Gateways or an Internet Service Provider) through a public data network." This definition is represented in Figure 3 as we have two sites (A and B) interconnected through the Internet. In , "A tunneling protocol is one that encloses in its datagram another complete data packet that uses a different communications protocol. They essentially create a tunnel between two points on a network that can securely transmit any kind of data between them." Additionally, in , "Tunneling enables the encapsulation of a packet from one type of protocol within the datagram of a different protocol." These two definitions add the notion of a packet being encapsulated in another packet, thus the tunneling protocol creates a new packet from the original packet. For example GRE adds a new header (Figure 5) to the packet. Microsoft defines, in , a virtual network as a configurable network overlay. The devices from one virtual network can reach each other but those outside of it can not, thus providing isolation to the devices inside the virtual network. A more concise definition of a virtual network is given in : "[...] a virtual network is a subset of the underlying physical network resources." Using both definitions we see that a virtual network is a configurable overlay network that uses resources, virtual nodes, and virtual links of a physical infrastructure while at the same time keeping them isolated. A virtual node is a logical node using at most all the resources of a physical node. A virtual link works the same way as a virtual node but using resources of a physical link. This being said, a virtual network does not use all the resources of a physical infrastructure, and so it is possible to have several virtual networks over a physical infrastructure. In order to achieve this, it must allocate resources from physical nodes and physical links to each virtual network. Information about resource allocation algorithms and technology can be found at  and . 3.3 Multitenancy via virtual networks and tunneling As stated in Section 3.1 a virtual network only uses a portion of the physical infrastructure’s resources. This implies that the rest of the resources can be used for other virtual networks in order to maximize infrastructure usage. Doing so means that the physical resources are shared among virtual networks, however a virtual network belongs to a tenant, and thus the infrastructure provides resources for multiple tenants. This is what we call multitenancy (Section 2). To grasp the notion of isolation in a data center, it is necessary to understand that the goal of the data center operator is to maximize the use of its infrastructure. To achieve this, it uses virtual networks and tunneling in order to accommodate the maximum number of tenants possible on its infrastructure. Several hundreds or thousands tenants can share the same infrastructure Figure 2: Example of overlay networks Figure 2 shows an example of a physical infrastructure hosting two overlay networks. We can see that one physical node belongs to both overlay networks as it hosts one virtual node from each overlay network 3 inside a data center, thus the main challenge when providing isolation in this environment is to be able to provide it for a very large number of tenants. Each tenant wants to have its network isolated from other tenants, therefore scalability is a concern. Another challenge is to provide an isolation solution that can sustain misbehavior or misconfiguration inside tenants’ networks without impacting other tenants. Therefore the solution must be resilient. Additionally, isolation inside a data center must be assured inside the whole data center therefore all the devices composing the infrastructure must manage the chosen isolation solution. A fourth challenge is to maintain availability of the data center even when updating the infrastructure by adding new devices or tenants. Moreover, each tenant has its own rules and policies, thus the isolation solution must enforce those rules only for the right tenant. To summarize, the main issue with multitenancy in a data center is caused by the huge numbers of tenants, policies, servers and links. It is mostly a scalability issue. However, the isolation solution must cope with this issue without degrading, too much, the performances of the infrastructure. Therefore, another challenge for multitenancy is to have an isolation solution with a low overhead. system calls of the VM and maps these calls to the underlying hardware. This implies that the hypervisor induces a certain percentage of overhead. However, a hypervisor allows the partition of the hardware, thus having several tenants on the same physical node. Since the hypervisor is between the VM and the node, it intercepts all the traffic, thus it can isolate each virtual machine and their resources. With this isolation, a hypervisor allows the protection of tenant resources thereby enabling multitenancy. 3.3.2 Database level isolation While hypervisor-level isolation is used in Infrastructure-as-a-Service (IaaS), database-level isolation is used in Software-as-a-Service (SaaS). In SaaS, tenants share a database. In  the authors describe the three main approaches for managing multi-tenant data in database. The first approach is to have separate databases for each clients. The second one is to have a shared database but with separate schemas, therefore multiple tenants use the same database but each tenant possesses its own set of tables. The last approach is to share both the database and the schemas. In this case an id is append to each record in order to indicate which tenant is the owner of the record. In  a new schema-mapping technique for multi-tenancy called "Chunk Folding" is proposed in order to improve the performance of database sharing among multiple tenants. Additionally, in , the authors propose a solution called SQLVM which focuses on allocating resources for tenant dynamically while ensuring low overheads. Other solutions like [32, 33, 34] focuses on improving database performances when the number of user increases. This type of isolation has more security concerns than hypervisor-level isolation. If the modification of Figure 4: Multitenancy the request is mis-configured or if there is an error in an access control list then tenants’ information is at An example of a multi-tenant data center is shown in risk. Figure 4. In this example, either the three tenants have data stored on Server 1 (S1) and each tenant possesses 3.3.3 Network level isolation a virtual network therefore their traffics are tunneled from end to end. The various flow representations in- Tunneling is a key element for isolation inside a data dicate all path long isolation for each tenant’s traffic center, however data centers have different constraints than typical LAN networks. The most important one while using a common infrastructure. In order to enforce multitenancy we need both iso- is the number of different tenants whose isolation must lation in the network (Section 3.3.3), via the tunneling be provided. Therefore the scalability of the tunneling protocol, and isolation in the nodes, via Hypervisor protocol and the maximum number of tenants it can level Isolation (Section 3.3.1) or Database level Isola- manage, is a criterion to take into account when choostion (Section 3.3.2). However, by achieving isolation ing a tunneling protocol. If the tunneling protocol can there is some performance degradation (Section 3.4.1) not isolate all the tenants of a data center then there is as well as risks (Section 3.4.2) when the isolation solu- no interest in using it, therefore we performed a scalability comparison in Section 7.5. Another criterion is tion is violated. the overhead induced by the tunneling protocol. The challenge is to have the lowest overhead possible. In 3.3.1 Hypervisor level isolation Section 7.2 we compare the overhead induced by the A hypervisor is a software mapping of the physical ma- tunneling protocol. A third criterion, which influences chine commands to a virtualized machine (VM) run- overhead, is the security provided by the tunneling proning a regular OS. The hypervisor intercepts the OS tocol (Section 7.1.4). Then the resilience criterion is 4 3.4.2 also important in order to quickly mitigate any link or node failure (Section 7.4). As we focus on data center networks we also choose as criterion, the ease of multi data center interconnection which we describe in Section 7.6. Another element of choice when deciding which tunneling protocol to choose is how it enforces its tunnel. There are two possibilities as indicated in . The first one is the Host isolation technique described at the beginning of Section 4.1. In this technique, the ingress and egress nodes are the ones enforcing the isolation. The second technique, called the Core isolation technique (Section 4.2), enforces network isolation at each switch on the path. However, while these criteria must be taken into account, the goal is to have the greatest scalability possible with the lowest overhead and complexity. 3.4 Multitenancy is based on sharing the underlying physical infrastructure, thus tenants’ data is stored on the same devices while being isolated via tunneling protocols and hypervisor- or database-level isolation. However, if one of these isolation mechanisms fails, then the data can be seen by other tenants. Having the hypervisor- or the database-level isolation fail implies that all the tenants sharing the same hypervisor or database can access all the data managed by the hypervisor or that is inside the database. This is an issue, however it is restricted to one node and can be resolved without shutting down the whole data center. The worst case is when the tunneling protocol fails. In this case all tenants’ traffic is visible, thus data can be stolen or misused by other tenants. To resolve this situation, the data center must be stop in order to reconfigure or change the tunneling protocol, thus the choice of a tunneling protocol is an important decision. Impact of isolation Providing multitenancy is a key function for cloud data center. However, this functionality implies overhead (Section 3.4.1) and also risks (Section 3.4.2) in case the tenants isolation fails. 3.4.1 Isolation violation risks 4 Network isolation in traditional data center The network solutions introduce in this Section were designed before the development of cloud data center. They possess capabilities for isolating flows in a network but are either not scalable enough, or were not designed for cloud data center topologies. As such they can not cope with the increasing number of flows and Virtual Machines (VMs) to isolate. Additionally they can not manage VM live migration, which is not necessary for traditional data centers in which there is no VM, but is mandatory for cloud data centers. Therefore, we present those solutions because they can be used in some traditional data center and mostly to show that there is a need for new multi-tenant network isolation solutions. Isolation performances overhead Hypervisor performances have already been studied in several papers [36, 37, 38, 39]. They induce overhead thus performance is decreased. However, not all hypervisors have the same impact on performance. In  four hypervisors (Hyper-V, KVM, vSphere, Xen) are compared over four criteria (CPU, Memory, Disk, Network). Their results show that hypervisor overhead is globally low, therefore they do not deteriorate performances too much. Network isolation solutions are the main subject of this paper, thus in Section 7 a comparison is done between several of them. First we study their complexity based on six criteria. The first is the control plane design of the solution. The second is network restrictions imposed by each solution, some of them only work with Layer 3 (L3) networks, other only with Layer 2 (L2) networks, and some of them need specific architecture. The third criterion focuses on tunnel configuration and establishment. We analyze if there are messages needed to establish a tunnel, or if it must be allocated before hand on each node. The fourth criterion is tunnel management and maintenance in order to determine the quantity of messages needed by each protocol to keep alive those tunnels. The fifth criterion is the capacity of those tunnels to handle multiple protocols. The sixth, and last, criterion of this complexity study focuses on their security mechanisms. Then we study the overhead induced by each solution, followed by their capability to migrate VM and a comparison of their resilience. The last criterion for their comparison is their scalability and their capacity to be managed among multiple data centers. 4.1 Host isolation Host isolation is an isolation method which selects flows once they arrive at the destination host or the egress node of the tunnel. This means that all along the path no switching or routing device checked the packets or messages of the flow. The switching or routing is done normally using the information of the transport header. It also means that there is no explicit need for a tunneling protocol in this kind of isolation. For example the Ethernet protocol is a Host isolation protocol. When a packet reaches a host, this host checks the MAC address and accept or not the packet if the MAC address of the packet matches the MAC address of the host. It is only once the packet arrives at the destination host or the egress node of the tunnel that checks are done. Either the destination host or the egress node verifies if both the destination Virtual Machine (VM) and the flow, to which the data belongs, are from the same virtual network. If they do belong to the same virtual network then the data is either delivered to the 5 VM or dropped depending on policies. This Host isolation advantage is that it does not require the node to know the whole topology. Indeed it is not necessary to know the location of the others VMs in order to distribute the flows. However this techniques has drawbacks. The lack of information about the VMs belonging to the same network imposes that the flows of each VM be propagated in the whole data center. This creates useless traffic, overloading the data center, which is dropped when received by a physical node not belonging to the same network. Additionally, such isolation technique security is weak against a "man in the middle" attack. An attacker who is able to put a listening device in the network could see the traffic from all the clients. 4.1.1 visioning is not scalable. In addition, GRE does not protect the payload of its packets because of a lack of integrity check and encryption. In order to resolve this last issue, it is possible to use GRE over IPSec. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C|R|K|S|s|Recur| Flags | Ver | Protocol Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum (optional) | Offset (optional) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key (optional) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (optional) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Routing (optional - variable length) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ C = Checksum Present R = Routing Present K = Key Present S = Sequence Number Present s = strict Source Route Recur = Recursion Control Ver = Version Host isolation for both Layer 2 and Layer 3 networks The protocol presented in this section can be used over both Layer 2 and Layer 3 network. Figure 5: GRE Header 184.108.40.206 GRE: Generic Routing Encapsulation The Generic Routing Encapsulation protocol was proposed in "RFC 1701" . In this RFC, the goal of GRE is to encapsulate any other protocol with a simple and lightweight mechanism. To do that GRE adds a header to the packet (Figure 5). The original packet is called the payload packet and the GRE header is added to it. Then, if the packet needs to be forwarded, the delivery protocol header is also added to the packet. The GRE header is composed of nine mandatory fields, for a total of 4 bytes, and of five optional fields with a variable total length. In these optional fields, there is a routing field which contains a list of Source Route Entries (SRE). Thanks to this field, it is possible to use the GRE header to forward the packet without adding a delivery header. In the second RFC, "RFC 2784"  derived from the original "RFC 1701" without superseding it, the GRE header has been simplified. The header is now made of 4 mandatory fields (4 bytes) and of 2 optional fields (4 bytes). The header length is now limited to 8 bytes whereas in the first RFC there was no length limit. The new header needs the delivery header because there is no information to forward or route the packet in it anymore. As it is a lightweight mechanism, some functionalities are not managed such as the discovery of the MTU along the path. This could be an issue if the source sends a packet with the "don’t fragment bit" set in the delivery header and the packet is too big for the MTU. In this case the packet is dropped along the path. As the error message is not required to be sent back to the source, the source could keep sending packets too big for the MTU. Those packets would always be dropped and would never reach their destination. GRE allows for an easy deployment of IP VPN and can tunnel almost any protocol through those VPN. Additionally it is possible to authenticate the encapsulator by checking the Key field. However, the pro- 4.1.2 Host isolation for Layer 3 networks In this section we introduce four Host isolation protocols which impose that the underneath network be a Layer 3 network. 220.127.116.11 PPTP: Point-to-Point tunneling protocol The Point-to-Point Tunneling Protocol (PPTP), from "RFC 2637" , was introduced 5 years after the first version of GRE . This can explain why GRE is used to do the tunneling in PPTP. However, the GRE header is modified in PPTP (Figure 6). The Routing field is replaced by an acknowledgment number field of 4 bytes. This way, the header has a maximal length. The new acknowledgment number field is used to regulate the traffic of a session. As the PPTP tunnel multiplexes sessions from different clients, this acknowledgement number allows traffic policing for each session. The tunnel formed by PPTP between the PPTP Access Concentrator (PAC) and the PPTP Network Server (PNS) is deployed over IP, which is why the routing field was not necessary (as the routing is done by IP). The other difference with GRE is the use of the key field. In PPTP, the key field is divided in two parts: the higher two bytes, which are used for the payload length, and the lower two bytes which are for the call Id. The call Id represents the owner of the packet. One of the advantages of PPTP is that it only needs two devices to be deployed at each end of the tunnel. A PNS at one end and a PAC at the other. There is no need to know or interact with the network between both ends. For data confidentiality, PPTP uses an encryption technique called Microsoft Point-to-Point Encryption (MPPE). MPPE uses the RSA RC4 encryption algorithm and a session key to encrypt the packet. Three key lengths are supported: 40 bits, 56 bits, and 128 bits. Another advantage of PPTP is that there 6 is no need for an agreement with the service provider. The administrator in charge of the PPTP session has complete control over it. However the administrator must be able to install, configure, and manage a PPTP device at both ends of the tunnel. The disadvantage of PPTP is that it uses TCP for all its signaling so it does not support multipath, and always uses the same path for a session. PPTP is enduser initiated because of its design. Only both ends of the tunnel know about the tunnel, so the service provider is not aware of PPTP. Because of that there is no Quality of Service (QoS) possible. deliver the PPP frames to the appropriate interface. L2TP possesses integrated security techniques such as an authentication between the client and the LAC at the initiation of the tunnel, a tunnel authentication between the LNS and the LAC, and a client authentication and authorization between the client and the LNS. This last authentication can use the Password Authentication Protocol (PAP), the Challenge Handshake Authentication Protocol (CHAP), or a one time password. After that, the PPP session begins and the data can be exchanged. With L2TP, the ISP is needed which results in an extra cost in order to establish the tunnel. Nevertheless with the ISP’s involvement, it is possible to add Quality of Service guarantees (QoS) and to benefit from the ISP IP network reliability. The fact that the L2TP encapsulation is done by the LAC means that there is no need for client software. This is advantageous because it removes the difficulties associated with managing remote devices. However there is no multipath management because of the design of L2TP, in which a client is in one tunnel and the tunnel has only one path. But load balancing and redundancy are possible thanks to multiple home gateways. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C|R|K|S|s|Recur|A| Flags | Ver | Protocol Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Length | Call ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Optional) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgement Number (Optional) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ C = Checksum Present R = Routing Present K = Key Present S = Sequence Number Present s = Strict source route present Recur = Recursion control A = Acknowledgement sequence number present Flags = Set to zero Ver = Version set to 1 Payload Length = Size of the payload, not including the GRE header Call ID = Contains the Peer’s Call ID for the session to which this packet belongs Sequence Number = Contains the sequence number of the payload 18.104.22.168 L2TPv3: Layer Two Tunneling Protocol - Version 3 In L2TPv3, "RFC 3931" , the tunnel can be established between L2TP Control Connection Endpoint (LCCE) which are the LAC and the LNS. The novelty is that it is possible to have a LAC-to-LAC tunnel or a LNS-to-LNS tunnel. In addition a device can be a LAC for some sessions and a LNS for others. Another modification is the use of two headers, the control message header (Figure 7) and the data message header (Figure 8), instead of one header for all messages. The control message header has the same length as the original one but with one less field. The Session ID field is now 4 bytes long, instead of 2 bytes, and the Tunnel ID field is removed. L2TPv3 replaces the data header with a L2TPv3 Session Header. The RFC states that, "The L2TP Session Header is specific to the encapsulating Packet-Switched Network (PSN) over which the L2TP traffic is delivered. The Session Header MUST provide (1) a method of distinguishing traffic among multiple L2TP data sessions and (2) a method of distinguishing data messages from control messages." The LCCE from L2TPv3 does not need to be at a Point Of Presence (POP) of an ISP. Consequently, it is possible to establish a tunnel without the ISP’s help, thus reducing the cost. However, without the ISP there is no service guarantee on the Layer 3 network. Figure 6: PPTP Header 22.214.171.124 L2TP: Layer Two Tunneling Protocol L2TP was developed by the IETF and proposed as a standard in "RFC 2661" . L2TP is designed to transport PPP frames (Layer 2) over a packet-switched network (Layer 3). With L2TP it is possible to have two PPP endpoints residing on different networks interconnected by a Layer 3 network. It allows extending the Layer 2 network to other Layer 2 networks interconnected through a Layer 3 network. To design L2TP, the IETF used the Layer-2 Forwarding (L2F) and the Point-to-Point Tunneling Protocol (PPTP) as a starting point. L2F is a Cisco proprietary tunneling protocol which provides a tunneling service for PPP frames. PPTP was developed by Microsoft and is also designed to transport PPP frames over Layer 3 networks. L2TP works with two devices, the L2TP Access Concentrator (LAC) and the L2TP Network Server (LNS). Those are the endpoints of the L2TP tunnel. The LAC is located at the ISP’s Point of Presence (POP). The LAC exchanges PPP messages with users, and communicates with customers’ LNS to establish tunnels. To use L2TP, the ISP needs to be informed because they must have a L2TP-capable POP. This POP must encapsulate PPP frames within L2TP ones, and forward them through the correct tunnel toward the LNS, which belongs to the customer. The LNS must accept L2TP frames, and strip the L2TP encapsulation in order to 126.96.36.199 PWE3: Pseudo Wire Emulation Edgeto-Edge Pseudo Wire Emulation Edge-to-Edge (PWE3) is a technology that emulates services from Layer 2 such as Frame Relay, ATM, Ethernet over packet switched networks (PSN) using IP or MPLS. It was proposed in 7 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T|L|Res|S| Reserved | Ver | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Control Connection ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ns | Nr | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ T = Must be set to 1 to indicate that this is a control message L = Length field present Res = Reserved S = Sequence Number Present Ver = Version Length = Total length of the message in bytes, including the header Control Connection ID = identifier for the control connection Ns = sequence number for this control message Nr = sequence number expected in the next control message to be received Figure 7: L2TPv3 control message header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | L2TPv3 Session Header | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | L2-Specific Sublayer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Tunnel Payload ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ L2TPv3 Session Header = specific to the encapsulating PSN over which the L2TPv3 traffic is delivered L2-Specific Sublayer = contains control fields that are used to facilitate the tunneling of each frame Figure 8: L2TPv3 data message header over IP "RFC 3985" . PWE3 defines an encapsulation function which encapsulates service-specific bit streams, cells, or PDUs. This service-specific data is encapsulated at the ingress node in order to be sent over the PSN. The encapsulation is done by the Provider Edge (PE), then the data is carried across a PSN tunnel. PWE3 is an encapsulation protocol which emulates a Layer 2 service over a PSN network. However, to tunnel the data, it needs a tunneling protocol such as L2TP or MPLS. This protocol uses a Host isolation method if it is used with L2TP and a Core isolation method if it is used with MPLS. 4.2 Core isolation The Core isolation method requires that each node of the network possesses a wider knowledge of the topology than when using the Host isolation method. As a matter of fact, in the Core isolation method each node on the path (switch or router) has to check the packet in order to verify if it can be forwarded toward its destination. The benefit of such an isolation method is that the packet is dropped at the closest node from the source if the destination is not reachable for policy reasons. This method considerably reduces traffic, by preventing the transmission of useless traffic to nodes which are not concerned by such traffic. However it is necessary to have a global view of the network topology in order to transmit packets. This implies either a pre-configured network with strict rules or an auto-configurable network. The first case means that the topology is rigid which is contrary to virtualization principles such as live migrations. In the second case, 8 it would increase the waiting time before being able to make two entities of the network communicate. This increase happens due to the time needed for exploring and sharing the network’s information. 4.2.1 Core isolation for Layer 2 networks Both Core isolation protocols introduce in this section require the underneath network to be a Layer 2 network. 188.8.131.52 VLAN: Virtual LAN A VLAN emulates an ordinary LAN over different networks as defined in the "802.1Q IEEE standard" . The nodes belonging to a VLAN are members of this VLAN. A member of a VLAN communicates with the other members of the VLAN as if they were on the same LAN despite their geographical location. VLAN members are in a logically separated LAN and share a single broadcast domain. They do not know that there are not on the same physical LAN. The other nodes, not member of the VLAN, will not see the traffic from the VLAN and will not receive any of the broadcast messages from the VLAN. All the traffic from a VLAN is isolated from the rest of the network. There are three methods to recognize the members of a VLAN. The first method is port based. The switch knows that the node connected at the specified port is a VLAN member. The specified port is tagged and is now processing VLAN-only messages. The second method is based on the recognition of the MAC address. And the third method is based on the recognition of the IP address. Independent of the method, the packets of the VLAN are tagged with a 4-byte header (Figure 9) between switches and routers. This field contains the VLAN ID (VID) field, which is 12 bits long. The VID is used to know which VLAN the message belongs to, since switches and routers can multiplex VLANs on a link. Such link is called a VLAN trunk. If a VLAN member has moved and the VLAN is configured to use MAC addresses, the VLAN can recognize that the member has moved. The VLAN can then automatically reconfigure itself without the need to change the member’s IP address. Among VLAN advantages we have that VLANs facilitate administration of logical groups of stations. They allow stations to communicate as if they were on the same LAN. The traffic of a VLAN is only sent to members of the VLAN which allows flow separation. A VLAN diminishes the size of a broadcast domain and so improves bandwidth. There is also a security improvement thanks to a logical isolation of the VLAN members. An ISP agreement is not needed to establish a VLAN. A disadvantage of VLANs is that there are only 4096 VLANs because of the size of the tag. To solve this, the IEEE 802.1ad standard , presented in Section 184.108.40.206, has been developed and it increases the number of VLANs. Another issue in the original definition of the 802.1Q is the lack of a control plane which enable automatically provisioning the path on each switch. this LSP, the labeled packets are forwarded based on This last issue is fixed by the use of the GARP VLAN their labels. After all the label changes, the egress LSR Registration Protocol (GVRP) , which is a Generic removes the label. MPLS creates a tunnel for the traffic from the rules Attribute Registration Protocol application. it uses. It is also possible to manually edit the labels to 0 1 2 3 define a LSP through the MPLS network. The traffic is 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ tunneled all along the path thanks to the configuration | TPID | PCP |*| VID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ and rules established in the control plane. TPID = Tag Protocol Identifier In order to use MPLS, the network must be MPLSPCP = Priority Code Point ready and configured with a FEC for the traffic. ISP *CFI = Canonical Format Identifier VID = Unique VLAN Identifier intervention is needed to have a FEC configured, meaning extra cost for the client. However customers could Figure 9: VLAN Header have QoS for their traffic. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Label Value | EXP |S| TTL | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ EXP = Experimental S = bottom of stack bit TTL = Time To Live 220.127.116.11 802.1ad - Provider Bridges The first draft  of the 802.1ad standard was released in 2002 and was intended to enable a service provider to offer separate LAN segments to its users over its own network. Therefore, both the user and the provider possess their own VLAN field which inFigure 10: MPLS header crease the number of available VLANs. Even if this solution was proposed before the growth of cloud data center, it has been adapted to fit to cloud data center 18.104.22.168 GMPLS: Generalized Multi-Protocol networks, therefore we present this solution in Section Label Switching 22.214.171.124. GMPLS was proposed in "RFC 3471"  and updated in "RFC 3945"  as an extension of MPLS. This 4.2.2 Core isolation protocols for Layer 3 net- extension adds support for new switching types such as Time-Division Multiplexing (TDM), lambda, and works fiber port switching. To support those new switching In this section we present three Core isolation protocols types, GMPLS has new functionalities which modify which can be used only if the underneath network is a the exchange of labels and the Label switched Path Layer 3 network. (LSP) unidirectional characteristic. MPLS forwards data based on a label, but the new switching tech126.96.36.199 MPLS: Multiprotocol Label Switching niques are not based on header processing, so GMPLS Multiprotocol Label Switching (MPLS), defined in must define five new interfaces on the Label Switching "RFC 3031" , is a circuit technique that uses label Routers (LSR). stacks on packets in order to forward them. MPLS uses 1. The Packet Switch Capable (PSC) interface, like Layer 3 (IP) routing technique with Layer 2 forwardthe one from MPLS, uses the header of the packet ing in order to increase the performance/price ratio of for routing. routing devices and to be open to new routing services invisible at the label forwarding level. 2. The Layer-2 Switch Capable (L2SC) interface uses MPLS decreases the processing time of each packet, the frame header, like the MAC header or the with only a label of 20 bits (Figure 10) to look at to ATM header, to forward the frame. forward the packet. It has the ability to work over any Layer 2 technology such as ATM, Frame Relay, Ether3. The Time-Division Multiplex Capable (TDM) innet, or PPP. MPLS has traffic engineering techniques terface switches data thanks to the data’s time slot with the Resource reSerVation Protocol (RSVP)  or in a repeating cycle. Constraint-based Routing Label Distribution Protocol 4. The Lambda Switch Capable (LSC) interface re(CR-LDP)  and enables Quality of Service (QoS) ceives data and switches it via its wavelength when using DiffServ . it was received. MPLS packets are named "labeled packets" and the routers which support MPLS are called "Label Switch5. The Fiber-Switch Capable (FSC) interface ing Routers" (LSR). The packets are labeled, at the switches the data based on its position in physical ingress LSR, depending on the forwarding equivalence space. class (FEC) they belong to. Those labels are locally used, each LSR changes it depending on the label the For the LSC interface, the header is 32 bits long and next LSR in the path as announced for the FEC. For contains only a Label field (Figure 11). The other ineach FEC exists at least one path across the MPLS terfaces use the same header which contains a Label network. This path is a Label Switched Path (LSP). field with a variable length (Figure 12). However, to All the packets of one FEC takes the same LSP. On establish a circuit, two interfaces of the same type are 9 needed at each end. In GMPLS it is possible to establish a hierarchy of LSPs on the same interface or between different interfaces. If it is on the same interface, it means that the LSR was able to multiplex the LSPs. If it occurs between interfaces then that means that the LSP start with one type of interface and another one is used along the path. If such an interface change happens on the path, then the original LSP is nested into another LSP. This new LSP must end before the original LSP in order to have the same interface type for the final one as the first one. For example, the LSP starts and ends on a PSC interface and along the way the interface changes into FSC so the PSC LSP is nested into a FSC LSP. As MPLS, GMPLS uses the control plane to establish rules, labels, to route all the data of an LSP which are tunneled through a unique path. GMPLS extends LSRs capabilities from MPLS by allowing different techniques for data forwarding. However GMPLS shares the same constraint as MPLS, in that there must be an agreement with the ISP before using it. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Label | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ network, the client and the ISP must have an agreement. However a BGP/MPLS VPN system allows the overlapping of address spaces between VPNs, so clients could use any addresses they want in their VPN. BGP/MPLS IP VPNs grants privacy if the network is well configured. But there is no encryption, no authentication, and no integrity check method. In order to add security measures IPsec must be used. Figure 13: BGP/MPLS network 5 Figure 11: GMPLS header for Lambda interface 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Label Value | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Label: Variable Length Figure 12: GMPLS header for PSC, L2SC, TDM and FSC interfaces 188.8.131.52 BGP/MPLS IP Virtual Private Networks In Border Gateway Protocol (BGP) / MultiProtocol Label Switching (MPLS), described in "RFC 4364" , the idea is to use the MPLS label to forward the packet in the network and BGP to exchange the route information between the LSRs. As shown in Figure 13, clients need to install at least one Customer Edge (CE) router at each site they want to connect. The CE has to know every private IP address from the site it is in. The CEs are then connected to Provider Edges (PE) provided by the ISP. Each PE learns all the IP addresses accessible through the CE it is connected to and then uses BGP to exchange the addresses with the other PEs over the ISP’s core network. The PE creates one or more Virtual Routing and Forwarding (VRF) table(s) containing the information of the path to each PE and each device in his local network. The core network router, which is working with MPLS, does not know any of the clients addresses. A Virtual Private Networks (VPN) contains at least two PEs to connect two sites. In order to create such a Network isolation in cloud data center with multi-tenant capabilities The solutions and protocols shown in Section 4 are not appropriate for isolation in Cloud Data Centers (Cloud DC) prone to virtualization. New data centers use virtualization technologies in order to increase the number of tenants sharing the same infrastructures and the limits of those isolation techniques are not sufficient for accommodating all clients. For example, GRE 184.108.40.206 provisioning is not scalable. We must know beforehand how many clients there will be in the data center, which is not possible in virtualized data centers. PPTP is end user-initiated because of its design, thus preventing the administrator of the data center from using it freely. The VLAN limit of 4096 different identifiers is very small in comparison to the number of clients in a virtualized data center. Already in 2010, VMware users were running an average of 12.5 virtual machines per physical server . However those users were not necessarily professional data center, and consequently may not have purchased the best server. In  the number of VMs per server is given as a ratio of 25 VMs per 1 server, and is expected to grow to 35 VMs per server. However, VMware is currently advertising that some of its servers can host 29 VMs . We need to be careful with these statements because we do not know what kind of VMs, work-intensive or not, they take into account in their calculations. Nevertheless, sticking with 29 VMs per server and each VM belonging to a different client, the VLAN limit is reached with 141 servers, which is a small number of servers for a cloud data center. In  the number of servers expected in Amazon data center is 46.000 and in  we have 10 approximations of the number of servers owned by the largest data center companies. MPLS (Paragraph 220.127.116.11) could be a solution for Layer 3 data centers as it has enough different labels to accommodate for more than a million customers. However it is not widely used in data centers because of a complexity issue concerning the distribution of the labels over the network and because of its cost . In this section we group solutions based on two criteria. The type of isolation (Host isolation or Core isolation) is the first criterion. Then, the second criterion to further separate the solutions is the layer of the underneath infrastructure required by the solution. 5.1 In conclusion Diverter provides Layer 3 network virtualization, over a large flat Layer 2 network, in which tenants are isolated. Tenants can also control their own IP subnet and VMs addresses as long as they respect the restrictions on IP addresses imposed by Diverter. Host isolation for Cloud DC In this section we present 8 solutions which we consider as Host isolation solutions designed for Cloud DC. Those solutions are: Diverter, BlueShield, NetLord, LISP, NVGRE, STT, VL2, and DOVE. 5.1.1 Host isolation for Layer 2 Cloud DC Protocols introduced in this section use the Host isolation technique and work over a Layer 2 network. Figure 14: BlueShield architecture (adapted from figure of BlueShied paper ) 18.104.22.168 Diverter Diverter  creates an overlay network with a software- 22.214.171.124 BlueShield only approach, thus alleviating the need for manual BlueShield  is an architecture which neither adds configuration of switches and routers. The software a header nor modifies the already existing header. It module, called VNET, is installed on the host of each also does not use a tag or VLAN like value to separate physical server. The server’s packets (VMs and host tenants’ traffic. Instead, it prevents the tenants’ traffic packets) and the packets from the network are inter- from being sent on the Layer 2 network by blocking cepted by the VNET which processes them. During address resolution and preventing the configuration of this process, the VNET replaces the MAC addresses of static hardware address entries. With these characterthe packet in order to have no virtual addresses appear- istics, BlueShield provides a complete isolation between ing on the core network. The destination MAC address tenants. is replaced by the MAC address of the physical server In order to allow communication between VMs of hosting the destination VM. The source MAC address the same tenant - or not, depending of the tenants’ is replaced by the MAC address of the server which demands - BlueShield uses a BlueShield Agent (BSA) hosts the source VM. In the Layer 2 core network, the in each VM and a vSwitch at each server (Figure 14). switches perform packet forwarding using the server’s The BSA will see all the ARP requests made by the VM MAC address. Tenant isolation is done thanks to the and will convert them in directory look-up (DLU) reVNET’s control of the packet. If there is no rule in both quests addressed to one or multiple Directory Servers VNETs allowing the communication between the VMs (DSs). The ARP requests are then dropped by the then the packet is not sent by the ingress VNET. If it vSwitch of the server before reaching the NIC and the is mistakenly sent, then the packet is dropped by the network. The DS searches in its rules whether or not receiver VNET. The control of the packet is done two the source VM can communicate with the destination times at both VNETs. This implies that both VNETs VM. If communication is not allowed, then the DS does must have the same rules allowing this communication not answer and the VMs can not communicate. Otherbetween the two VMs. wise the DS answers the request. As the BSA can send In Diverter the tenants cannot choose the addressing a request to multiple DS, they all answer, so there is a scheme they want. They must use IP addresses that requirement for synchronization of DSs’ rules. follow a specific format which is: BlueShield defines Echelon VMs as those whose task is to increase security and isolation. These Echelon VMs are disseminated on the network and share secuWhere tenant is the tenant ID, subnet the number of rity rules that they enforce by scanning all the packthe subnet belonging to the tenant in which the VM is ets passing through them. In order for the traffic to present and vm is the number of the vm in the subnet. pass through an Echelon VM, the rules in the DS must With this addressing scheme there is no risk of having be modified accordingly. The DS, instead of answering with the MAC address of the destination VM, will send identical addresses. 10.tenant.subnet.vm 11 the Echelon VM MAC address. In conclusion, BlueShield is a technique which allows tenants’ data isolation but not the isolation of tenants’ address-space. In addition, the rules in the DS, enforcing this isolation, must be the same on all the DSs and the Echelon VMs, if these last devices are used. The establishment and configuration of these rules lies with the administrator, and the techniques are those of his or her choosing. 5.1.2 Host isolation for both Layer 2 and Layer 3 Cloud DC In this section we present one Host isolation protocol which can be used over either a Layer 2 or a Layer 3 network. 126.96.36.199 NetLord In "NetLord: A Scalable Multi-Tenant Network Architecture for Virtualized Datacenters" the authors proposed a new multi-tenant network architecture. The core network is a Layer 2 (Ethernet) network and the use of Layer 3 (IP) is done at the last hop between the edge switch and the server. To provide tenant traffic isolation in the network, NetLord encapsulates tenant data with both Layer 3 (IP) and Layer 2 (Ethernet) headers (Figure 15). To do so, NetLord uses an agent in the hypervisor of the server to control all the VMs on the server. This agent has to encapsulate, route, decapsulate and deliver the tenant’s packet to the recipient. The source NetLord Agent (NLA) encapsulates the tenant’s packet with an IP header and an Ethernet header. The Ethernet destination address of the added Ethernet header is the MAC address of the egress edge switch. The IP destination address of the added IP header is composed of two values. The first is the number of the switch port (P) to which the machine is connected, and the second is the Tenant_ID (TID). This IP address is analyzed at the egress edge switch where it is used to route the packet toward the correct server by using the port number P from the IP address. Then, when the packet is received by the server, it is handed off to the destination NLA. This NLA has to use the TID part of the IP address to send the data to the correct tenant. The use of IP routing at the last hop allows the use of a single edge switch MAC address in order to communicate with the VMs on the server beyond this edge switch. This way, physical and virtual machines, from other servers, will only have one mac address to store, the edge switch MAC address. In addition, the mac addresses of the VMs are not exposed on the core network. However the tenant’s ID is exposed in the outer IP header. This exposition can be used by the provider to apply per-tenant traffic management in the core network without the need of per-flow Access Control Lists (ACLs) in the switches. NetLord also provides address-space isolation for tenants. The tenants are able to use any Layer 2 or Layer 3 addresses because NetLord does not impose restrictions on addresses, and there is no risk of badly routed packets because of these addresses. As stated earlier, the ingress switch will use the MAC address of the egress switch on the core network to forward the data. Then the IP address, composed of the port number and the TID, will be used at the egress switch. At any given time, the addresses defined by the tenant are only visible in the tenant virtual network. The tenant data between the egress and ingress switches are conveyed over the Layer 2 network thanks to VLANs. In order to choose which VLAN to use, NetLord applies the SPAIN  selection algorithm. However to support the SPAIN multipath technique and stock per-tenant configuration information, NetLord uses Configuration Repository which are databases. It also uses the same mechanisms as Diverter  to support virtual routing. To establish a NetLord architecture, edge switches that support IP forwarding must be used, which is not a common feature for commodity switches. In addition, the use of SPAIN implies a scalability issue and there is no support for bandwidth guarantee. In conclusion, NetLord provides tenant isolation but has some drawbacks in other areas. 5.1.3 Host isolation for Layer 3 Cloud DC The protocols introduced in this section use the Host isolation technique and require a Layer 3 network. 188.8.131.52 LISP: The Locator/Identifier Separation Protocol The Locator/Identifier Separation Protocol (LISP), presented in "RFC 6830" , aims at splitting the routing and the addressing functionalities. Currently, the IP address, a single field, is used both for routing and for addressing a device. In LISP, the routing functionality is done by Routing Locators (RLOCs) and the addressing functionality is done by Endpoint Identifiers (EIDs). An RLOC is an address, the same size as an IP address, of an Egress Tunnel Router (ETR). This RLOC indicates the location of the device in the network. This value is the one used by the Ingress Tunnel Router (ITR) to route the packet through the network toward the ETR. The ETR is the gateway of the private network. Then, to route the packet to the correct node in the private network the EID value is used. All the EID of the private network is mapped in the ETR. This value also has the same length as an IP address (32-bit for IPv4, or 128-bit for IPv6). Such a split is done by using different numbering spaces for EIDs and RLOCs. By doing this, LISP improves the scalability of the routing system thanks to the possibility of a greater aggregation of RLOCs than IP addresses. However in order to have this better aggregation, the RLOCs must be allocated in a way that is congruent with the network’s topology. On the other hand, the EIDs identify nodes in the boundaries of the private network and are assigned independently from the network topology. The encapsulation of the packet, in an IPv4 network, is shown in Figure 16. The outer header is the IP 12 Figure 15: NetLord architecture (figure from NetLord paper ) header with the RLOCs addresses as the source and destination addresses. Then the UDP header is added and followed by the LISP header which is 4 bytes long. The inner header is also an IP header but the source and destination addresses are now the EIDs addresses. It is a tunneling protocol that simplifies routing operations such as multi-homed routing and facilitates scalable any-to-any WAN connectivity. It also improves the scalability of the routing system through greater aggregation of RLOCs. However to benefit from LISP advantages it must use a LISP-enabled ISP. The ISP also benefits from using LISP because it has less information in his routing devices thanks to RLOCs aggregation. 184.108.40.206 NVGRE: Network Virtualization using Generic Routing Encapsulation Network Virtualization using Generic Routing Encapsulation (NVGRE), detailed in "NVGRE: Network Virtualization using Generic Routing Encapsulation" , is based on the Generic Routing Encapsulation (GRE)  encapsulation method. NVGRE allows the creation of virtual Layer 2 topologies on top of a physical Layer 3 network. The goal of NVGRE is to improve the handling of multitenancy in data centers. Network Virtualization is used in order to provide both isolation and concurrency between virtual networks on the same physical network infrastructure. To improve isolation, NVGRE modifies the GRE header by replacing the Key field with two fields, virtual Subnet ID (VSID) and FlowID (Figure 17). The first 24 bits are for the VSID field and the following 8 bits for the FlowID field. The VSID is used to identify the virtual Layer-2 network. With its 24 bits it is possible to have 224 virtual layer-2 networks which is more than the 4096 VLANs. The flowID is used to provide per-flow entropy in the same VSID. This NVGRE packet can then be encapsulated in both versions of IP whereas NVGRE cannot contain a 802.1Q tag. The NVGRE tunnel needs NVGRE endpoints between the virtual and physical networks. Those endpoints could be servers, network devices, or part of a hypervisor. NVGRE is using the IP address scalability to lower the size of Top of Rack 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 -------+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |Version| IHL |Type of Service| Total Length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Identification |Flags| Fragment Offset | Outer +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Header | Time to Live | Protocol = 17 | Header Checksum | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Source Routing Locator | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Destination Routing Locator | -------+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Source Port = xxxx | Dest Port = 4341 | UDP +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | UDP Length | UDP Checksum | -------+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |N|L|E|V|I|flags| Nonce/Map-Version | LISP +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Instance ID | LSBs | -------+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |Version| IHL |Type of Service| Total Length | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Identification |Flags| Fragment Offset | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Inner | Time to Live | Protocol | Header Checksum | Header +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Source EID | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | Destination EID | -------+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Inner Header = header on the datagram received from the originating host Outer Header = header prepended by an ITR IHL = IP-Header-Length N = nonce-present L = ’Locator-Status-Bits’ field enabled E = echo-nonce-request V = Map-Version present I = Instance ID flags = 3-bit field reserved for future flag use LISP Nonce = 24-bit value that is randomly generated by an ITR when the N-bit is set to 1 LISP Locator-Status-Bits (LSBs) = set by an ITR to indicate to an ETR the up/down status of the Locators in the source site when the L-bit is also set 13 Figure 16: LISP IPv4-in-IPv4 Header Format switches’ MAC address table. For the moment, the fact The goal of STT is to tunnel packets efficiently so that NVGRE is a work in progress and not a standard it supports Standard Equal Cost Multipath (ECMP). prevents it from being widely deployed, while awaiting Nevertheless, STT imposes that all the packets belongpossible modifications. ing to the same flow follow the same path. The multipath is done on a flow basis and not a packet basis. 0 1 2 3 However, the most important drawback of STT is the 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Outer Ethernet Header: | fact that there must not be any middle boxes on the +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ path. If those middle boxes are present, then they have | (Outer) Destination MAC Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ to be configured to let STT frames pass through. This |(Outer)Destination MAC Address | (Outer)Source MAC Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ implies that access to the middle boxes is required. So, | (Outer) Source MAC Address | for the moment, it is not feasible to have an STT tunnel +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ between two sites linked by an unmanageable network. |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ In addition, STT is not a standard, so not all devices | Ethertype 0x0800 | will be able to work with it. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Outer IPv4 Header: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| IHL |Type of Service| Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Time to Live | Protocol 0x2F | Header Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (Outer) Source Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | (Outer) Destination Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ GRE Header: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C| |K|S| Reserved0 | Ver | Protocol Type 0x6558 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Subnet ID (VSID) | FlowID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Inner Ethernet Header... 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Outer Ethernet header:(144 bits) Outer IP Header: (IPv4=160 or IPv6=320 bits)... Outer TCP-like header: (192 bits) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ STT header: (144 bits) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | L4 Offset | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Max. Segment Size | PCP |V| VLAN ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Context ID (64 bits) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Padding | data +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ C = Checksum Present (must be zero) S = Sequence Number Present (must be zero) K = Key Present (must be one) Virtual Subnet ID (VSID) = 24-bit value used to identify the NVGRE based Virtual Layer-2 Network FlowID = 8-bit value used to provide per-flow entropy for flows in the same VSID Figure 17: NVGRE Header 220.127.116.11 STT: Stateless Transport Tunneling Flags field contains: Protocol o 0: Checksum verified. Set if the checksum of the encapsulated packet has been verified by the sender. The Stateless Transport Tunneling Protocol (STT), ino 1: Checksum partial. Set if the checksum in the troduced in "A Stateless Transport Tunneling Protocol encapsulated packet has been computed only over the TCP/IP header. This bit MUST be set if TSO is used by the sender. for Network Virtualization (STT)" , is a new IPNote that bit 0 and bit 1 cannot both be set in the same based encapsulation and tunneling protocol which adds header. o 2: IP version. Set if the encapsulated packet is IPv4, not a new header (Figure 18) to the packet and also modset if the packet is IPv6. See below for discussion of ifies the TCP header. The new header contains a 64non-IP payloads. o 3: TCP payload. Set if the encapsulated packet is TCP. bit Context ID field which can be used to differentiate o 4-7: Unused, MUST be 0 on transmission and ignored on 264 ≈ 1.8 × 1019 virtual networks. The modifications receipt. done to the TCP header are about the meaning and L4 offset = offset in bytes from the end of the STT Frame header use of both the Sequence Number (SEQ) and the Acto the start of the encapsulated layer 4 (TCP/UDP) header knowledgment Number (ACK). The SEQ field is now Max Segment Size = TCP MSS that should be used by a tunnel divided into two parts. The upper 16 bits of the SEQ endpoint field are used to indicate the length of the STT frame PCP = 3-bit Priority Code Point field V = 1-bit flag that indicates the presence of a valid VLAN ID in bytes. The second part of the SEQ field, the lower VLAN ID = 12-bit VLAN tag 16 bits, is used for the offset, expressed in bytes, of Context ID = 64 bits of context information the fragment within the STT frame. Reusing the TCP Figure 18: STT header header allows the STT to be easily encapsulated in IP datagrams. The Protocol Number field for IPv4 or the Next Header field for IPv6 have the same value as for regular TCP. An additional difference between TCP 18.104.22.168 VL2 and STT is that STT, as the name indicates, does not VL2, presented in , is an architecture designed to allow agility, and notably the capacity to assign use state. 14 any server to any service. To do so, VL2 uses two different IP address families, the Location-specific IP addresses (LAs) and application-specific IP addresses (AAs). This addressing scheme separates server names, the AAs addresses, and their locations, the LAs addresses. However, this implies that a mapping between AAs addresses and LAs addresses is needed. This mapping is created when application servers are provisioned to a service and assigned AAs addresses. This mapping is then stored in a directory system which must be reliable. To improve this reliability the directory system can be replicated and can use several directory servers but this implies that those directory servers are synchronized. The directory system is used to achieve addresses resolution but every server must implement a module called a VL2 agent to contact the directory system. This VL2 agent contacts the directory system to retrieve the LA address corresponding to an AA address. Each AA address is associated with an LA address. This LA address is the identifier of the Top of the Rack switch to which the server, identified by the AA address, is connected. The AA address remains the same even if the LA address is changed due to a virtual machine migration or re-provisioning. The AAs addresses are assigned to servers and the LAs addresses to the switches and interfaces. To do this LA address assignment, switches run an IP-based link state routing protocol. VL2 works over a Clos topology  and a Layer 3 network. This network routes traffic by LAs addresses, so in order to route the traffic between servers with AA addresses, encapsulation is needed. This encapsulation is done by the VL2 agent which encapsulates the IP packet in an IP packet (IP-in-IP) and uses the associated LA address, in the directory system or its local cache, with the AA address destination address. In VL2, the isolation of the server is achieved through the use of rules in the directory system. Additionally, those rules are enforced by each VL2 agent. For example if a server is not allowed to send a packet to a different server, the directory service will not provide an LA address to the VL2 agent for the packet which will be dropped by the VL2 agent. 22.214.171.124 DOVE: Distributed Overlay Virtual nEtwork Distributed Overlay Virtual nEtwork [16, 17] is a technique designed with a centralized control plane over a Layer 3 network. DOVE does not provide an encapsulation protocol and uses others protocols, such as VXLAN, NVGRE, or STT, as long as those protocols allow the use of two parameters. The first parameter is the virtual network ID and the second, a policy specifier defined by a domain ID which is optional. By not providing an encapsulation protocol, DOVE is not limited to Ethernet emulation and could be used over a Layer 2 network. Dove provides tenant isolation thanks to the use of an encapsulation protocol whose header is added by dSwitches, and the use of the DOVE Policy Service (DPS). The dSwitches are the edge switches of the DOVE overlay network. They are used in each physical server to act as the tunnel endpoint for the VMs of these servers. The DPS is a unique component in the DOVE network, whose function is to process dSwitches policy requests. It maintains all the information regarding existing virtual networks as well as their correlation with the physical infrastructure, policy actions, and rules. It is thanks to these policy requests and responses that a dSwitch knows if a VM can communicate with a different VM, and learns the address of the dSwitch which manages the other VM. As a solution using a centralized control plane, DOVE needs a "highly available, resilient, and scalable" device to host the DPS as stated by the authors. As we have seen in the PortLand architecture, the Fabric manager needed at least 15 CPU cores working nonstop to process the ARP requests for 27.648 end hosts which each make 25 ARP requests per second. Here in DOVE, the issue is worse with the DPS. In PortLand, it was just a database query to retrieve a PMAC address. In DOVE, the lookup searches for the address of the dSwitch, and corresponding policy rules and actions in order to determine the next action. 5.2 Core isolation for Cloud DC In this section we introduce protocols using the Core isolation technique and with multi-tenant capabilities. We divide them into three categories depending on the Layer of the underneath network. 5.2.1 Core isolation for Layer 2 Cloud DC The five protocols presented in this section required that the underneath network be a Layer 2 network in order to be used. 126.96.36.199 802.1ad (QinQ) The IEEE 802.1ad standard  also known as "QinQ" is in fact an amendment to the IEEE standard 802.1Q. This amendment enables service providers to offer isolation to their customers’ traffic. In addition to the VLAN information of the client, defined by the 802.1Q standard, this new 802.1ad standard defines another VLAN for the provider. The customer VLAN header is called the C-TAG (customer TAG) and is the inner header. The outer header is the S-TAG (Service TAG) for the provider. In Figure 19 we represent the two VLAN headers. The TPID0 field has a default value of 0x88A8 which is different than the default value (0x8100) of the 802.1Q standard. TPID1 is configured with the default value 0x8100. This differentiation indicates to the switch that there are two TAGs. Thanks to the S-TAG header, the provider can manage only one VLAN for all the VLANs of one client. He is able to provide 212 = 4096 VLANS to each of his 4096 clients which results in 212 ∗ 212 = 16777216 different VLANs. If it is the solution chosen by the provider then the 802.1ad VLAN management is identical to the 802.1Q VLAN management because the provider only cares for the S-TAG header. It is the 15 client responsibility to manage his/her 4096 VLANs in his/her network. However, most of the time, one client does not need 4096 VLANS and the provider has more than 4096 clients. So instead of using both TAGs separately, the provider adds both TAGs in order to have 16777216 VLANs. This way the management of such a solution is more complex than the 802.1Q solution, but yields greater scalability. For switching the frames, the switches have to recover the VID of both TAGS and verify in a database with up to 16777216 values instead of 4096. This implies more work to obtain the VLAN ID and consequently more time to verify the VLAN ID in the database, it also uses more memory space because of its increased size. It means more CPU, more memory, and more latency at each switch. Additionally, irrespective of the way both TAGs are managed, the overhead is increased by four bytes. The advantage of the 802.1ad standard is that it raises the limit of VLANs possible from 4096, with 802.1Q, to 16777216, which should be sufficient for network growth during the next few years. If this new limit is still too small then there is the possibility to use the 802.1ad VLAN TAG stacking solution and add more VLAN TAGs to the header. However it is a nonstandard solution and might results in overhead issues because each time we add a VLAN TAG, the header increases by four bytes for only 12 bits of VID. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TPID0 = 0x88a8 | PCP |*| S-VID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TPID1 = 0x8100 | PCP |*| C-VID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ TPID0 = S-TAG Protocol Identifier TPID1 = C-TAG Protocol Identifier PCP = Priority Code Point *DEI = Drop eligible indicator S-VID = Unique Service VLAN Indicator C-VID = Unique Customer VLAN Indicator 21) is composed of two different MAC headers. The first one is the new header from the 802.1ah standard. This MAC header can be divided in two parts. The first part is the one with the Backbone components. The fields of this part are: • MAC Backbone Destination Address (B-DA), 6 bytes long • MAC Backbone Source Address (B-SA), 6 bytes long • EtherType with a size of 2 bytes and a value of 0x88a8 • Priority Code Point (3 bits) and the Drop Eligible Indicator (1 bit) • Backbone VLAN indicator (B-VID) with a size of 12 bits After this Backbone part, the second part, called the Service encapsulation, is three bytes long and contains the following fields: • EtherType with a value of 0x887e on two bytes • Priority Code Point (3 bits) and the Drop Eligible Indicator (1 bit) • Used Customer Address (1 bit) indicates if the customer address is valid or not • Interface Service Instance Indicator (I-SID) with a size of 20 bits Figure 19: 802.1ad header 188.8.131.52 802.1ah (mac-in-mac) The 802.1ah IEEE standard  was developed after the 802.1ad standard  in order to provide a method for interconnecting Provider Bridged Networks. This protocol is intended for network providers in order to attend to their needs for more service VLANs. This standard is also known as Provider Backbone Bridges (PBB) or "mac-in-mac". As the last name indicates, the idea is to add another MAC header on top of the existing MAC header. This new MAC header is added by a Provider Edge (PE) switch. This allows for the core network switches to only save the MAC of the PE switches, thus no MAC information of the client are used for switching inside the core network. All the mapping work is done by the PE switches and they are the ones responsible for encapsulating and decapsulating the messages. The encapsulation is done by adding a new MAC header to the message. This new MAC header (Figure With this new 802.1ah header we now have another MAC header for the provider to use. There are now 4096 VLANs possible with the B-VID. In each VLAN there are 220 = 1048576 supported services with the ISID field. This could amount to a total of 4294967296 VLANs with only the new header. Additionally only the PE switches have to learn the customers’ MAC addresses (C-DA and C-SA) and have to add and suppress the new 802.1ah header. The 802.1ah standard is an evolution of the 802.1ad standard which is an evolution of the 802.1Q standard. Each new standard has added information in the header of the message. The Figure 20 shows the evolution of the 802.1 header. We can see that in order to increase the number of VLAN identifiers, the size of the header keeps increasing. This increase implies, as stated in the 802.1ad standard 184.108.40.206, that switches, at least the PE switches, use more CPU time to process the header and use more memory to save all the information of the VLANs. 220.127.116.11 Private VLANs Private VLANs is a solution developed by Cisco and presented in "RFC 5517" . This solution is based on the aggregated VLAN model proposed in "RFC 3069" . The idea is to have a principal VLAN subdivided with secondary VLANs. The principal VLAN broadcast domain is therefore divided in smaller subdomains. A subdomain is defined by the designation 16 Figure 20: 802.1, 802.1Q, 802.1ad and 802.1ah frame formats (figure from "IEEE 802.1ah Basics") 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | B-DA ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ B-DA (cont.) | B-SA ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ B-SA (cont.) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TPID0 = 0x88a8 | PCP |*| B-VID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TPID1 = 0x88e7 | PCP |*|!| RES | I-SID +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ I-SI (cont.) | C-DA ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ C-DA (cont.) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | C-SA ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ C-SA (cont.) | TPID2 = 0x88a8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PCP |*| S-VID | TPID3 = 0x8100 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PCP |*| C-VID | Ethertype of Payload | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ B-DA = Backbone Destination Address B-SA = Backbone Source Address TPID0 = B-TAG Protocol Identifier PCP = Priority Code Point *DEI = Drop Eligible Indicator B-VID = unique Backbone VLAN Indicator TPID1 = I-TAG Protocol Identifier !UCA = Used Customer Address (0=Valid, 1=Not Valid) RES = Reserved I-SID = Interface Service Instance Indicator C-DA = Customer Destination Address C-SA = Customer Source Address TPID2 = S-TAG Protocol Identifier S-VID = unique Service VLAN Indicator TPID3 = C-TAG Protocol Identifier C-VID = unique Customer VLAN Indicator Figure 21: 802.1ah header of the switch’s ports group. In  there are three port designations. These port designations are as follows: 1. Isolated port: An isolated port can not talk with an isolated port or a community port. 2. Community port: A community port belongs to a group of ports. Those ports can communicate between themselves and with any promiscuous port. 3. Promiscuous port: A promiscuous port can talk with all the other ports. In order to create the subdomains within a VLAN domain, the VLAN ID is not enough. An additional VLAN ID is used. To refer to a specific Private VLAN, at least one pair of VLAN IDs is necessary. A pair of VLAN IDs is composed of one primary VLAN ID and of one secondary VLAN ID. The primary VLAN ID is the VLAN identifier of the whole Private VLAN domain. This scheme of VLAN pairing only requires the traffic from the primary and secondary VLANs to be tagged following the IEEE 802.1Q standard. It only uses a single tag at most, thanks to the 1:1 correspondence, between a secondary VLAN and its primary VLAN. The Private VLAN technique allows for a greater number of VLANs thanks to the recycling of VLAN IDs in secondary VLANs. It also allows for better addresses assignment in a domain because these addresses are shared between all the members of the private VLAN domain. 18.104.22.168 PortLand The Portland architecture  is one that uses a centralized control plane, with the core network working 17 Figure 22: A fat tree topology (figure from PortLand paper ) first eight bits, used for forwarding the data through the core switches. Then at the aggregation switches, the position part is used and, at the next level, the edge switches uses the port part. Finally the last part, the vmid, is used by the server to know to which VM to deliver the packet. Portland design implies that the fabric manager learns all the correspondences between PMAC addresses and IP addresses and uses this table to answer ARP requests from the edge switches. The edge switches then use the PMAC addresses they received to change the destination addresses. Since PMAC addresses are hierarchical, this enables switches to have pod (16 bits) is the pod number of the edge switch smaller forwarding tables. position (8 bits) the position of the end host in the The fact that the fabric manager, a single machine, pod has to manage all the ARP traffic makes this architecture not able to scale. For example in a data cenport (8 bits) is the switch port number the host is ter with 27,648 end hosts (not tenants) and each host connected to makes 25 ARP requests per second, the fabric manvmid (16 bits) is used to differentiate the virtual ager will need approximatively 15 CPU cores working non-stop to only manage the ARP requests. machine on the physical machine at Layer 2. The topology of the network must be a multi-rooted fat-tree  as in Figure 22. To manage forwarding and addressing, a fabric manager is used. A fabric manager is a user process running on a dedicated machine which manages soft states in order to limit or even eliminate the need for administrator configuration. In addition to the fabric manager, Portland introduces new MAC addresses for each end host. This new MAC address, called Pseudo MAC (PMAC) address, encodes the position of the end host in the topology. The PMAC is 48 bits long and is composed of four parts. 1. 2. 3. 4. The PMAC is not known by the end host which keeps using its actual MAC (AMAC) for its packets. When an edge switch sees a new AMAC, coming from a connected machine, it has to create a new PMAC and map the IP address with the AMAC and the PMAC. It then has to announce the new mapping between the IP address and the PMAC address to the fabric manager. This way when an edge switch wants to forward a message with only the IP address, it will do an Address Resolution Protocol (ARP) request which will be processed by the fabric manager and receive the PMAC address in the answer. Edge switches are responsible for mapping the PMAC to the AMAC. They also have to replace the AMAC with the PMAC for outgoing packets and replace the PMAC with the AMAC for arriving packets. This way Portland can uses a hierarchical PMAC addressing with only the pod part, the 22.214.171.124 SEC2 : Secure Elastic Cloud Computing Secure Elastic Cloud Computing(SEC2)  is an architecture which uses a centralized control plane over a Layer 2 core network. In this architecture, the network is divided in one core domain and several edge domains (Figure 23). An edge domain possesses an identifier, the edge id (eid), which is unique among the edge domains. Each edge domain is connected to the core domain via Forwarding Elements (FEs) which manage address resolution and enforce policy rules. In an edge domain, tenants are isolated thanks to the use of VLANs. A tenant’s subnet is identified by a unique Customer network id (cnet id). This implies that there is a 1:1 correspondence between a VLAN ID and a cnet id. In SEC2 there are only 4096 tenants in an 18 • (cnet id, eid) ↔ VLAN id. To identify which VLAN to use in the receiver edge domain. • cnet id ↔ rules and actions. To know if both tenants agree to communicate. To allow inter sub-network communication, each tenant must have at least one public IP address stored in the CC. Even if the design uses a centralized controllers, the CC can be distributed and the information can be divided per tenant. The author provides an example: For example, different customers can be assigned to different CCs by using Distributed Hash Table (DHT). Since the management and policy control of different customer networks are relatively independent, such partition does not affect the functionality of CC. Figure 23: SEC2 Architecture (adapted from figure of SEC2 paper ) edge domain. However, the VLAN ID can be reused in a different edge domain so the maximum number of VLANs allowed does not limit the number of tenants. As SEC2 does not limit the number of edge domains, to increase the number of tenants, the solution is to create a new edge domain. A tenant’s VMs are identified by the combination of the cnet id and their IP addresses. These IP addresses are freely chosen by the tenant and there is no restriction on them. The VM MAC address is then mapped to this combination (cnet id, IP). The MAC address is not needed by the FE to forward the packets because the pair (cnet id, IP) is unique. For resolving addresses and enforcing rules, FEs must obtain the information from the Central Controller (CC) on an on-demand basis. When a VM wants to send a packet to a different VM, the VM only knows the IP address and therefore will do an ARP request. This ARP request is intercepted by the FE which looks in its local cache to see if the answer is present. If not, it sends a request to the CC which answers with information such as the MAC address of the receiver, the eid of the domain in which the receiver is located, and the VLAN ID of the receiver. The FE then answers the ARP request with the MAC address. The other information is saved in the FE’s local database. When the packet reaches the ingress FE, the FE will encapsulate the packet with a MAC header. The destination address will be the eid previously received, and the VLAN number will be replaced by the one received from the CC. For the CC to be able to answer the FE requests, the core, edge, and network information must be stored. The following mappings are maintained by the CC: SEC2 provides tenant isolation and also addressspace isolation thanks to the use of VLAN in edge domains. FEs enforce the rules and policies that are stored in the CC, which also prevents inter-tenant communication if it has not been previously agreed. 126.96.36.199 VNT: Virtual Network over TRILL In  the authors propose a new technique of overlay network done over a Layer 2 network using the Transparent Interconnection of Lots of Links (TRILL) protocol [68, 69]. This overlay network is called Virtual Network over TRILL (VNT). In order to provide tenant isolation, VNT adds a Virtual Network Identifier (VNI) field in the TRILL header (Figure 24). The VNT header is composed as follows. The first 64 bits correspond to the basic TRILL header. They are followed by a block of 32 bits describing the criticality of the options. Then there are a reserved field of 18 bits and a flow ID field of 14 bits. The VNT extension is added as an option in the header, with the Type, Length, Value (TLV) format, and need a 64-bit block. The VNI field is 24 bits long and can differentiate approximatively 16 million tenants. A VNI is unique in the core network and is associated with one tenant. To apply this VNT extension to the packet, a new network component is introduced and is called a Virtual Switch (VS). A VS has to manage all the interfaces with the same VNI Tag (all the interfaces of one tenant). The provider administrator must link every new VMs of a tenant to the unique VS managing the VNI Tag associated with the tenant. Tenants are free to use any Layer 2 or Layer 3 address they want in their virtual network. These addresses are not visible in the core network and cannot affect • VM MAC ↔ (cnet id, IP). To resolve the IP ad- packet routing. The routing of the tenant data is done via two different routing techniques at different layers. dress of a VM and obtain its MAC address. The first routing, the virtual routing, is done at Layer • VM MAC ↔ edge domain id (eid). To know in 2, in the core network, through the VNI tag and some rules in the RBridges. A RBridge can only send a which edge domain the VM is located. packet toward another RBridge or an end host if they • eid ↔ FE MAC address list. To determine be- share the same VNI tag. The second routing is done at tween which FE to establish the data tunnel if Layer 3, and is dependent on both the tenant’s network there are multiple FEs for a single edge domain. and the tenant’s Layer 3 endpoint configuration. 19 This overlay technique allows an isolation of tenants’ data thanks to a VNI Tag in the TRILL header. This VNI Tag is a 24-bit value which allows for approximately 16 million different tenants, which is better than the limit of 4096 VLANs. The VNI Tag is also used to establish unique tree topology for each virtual network associated to this VNI Tag using the intermediate system to intermediate system (IS-IS) protocol [70, 71]. This way the data tagged with a VNI will only be propagated along this tree and will not be send to other tenants’ host, which ensure tenant isolation. Moreover the packets are routed based on the VNI tag in the physical network which isolates the space address of each tenant, so that a tenant can use Layer 3 and Layer 2 addresses. However, VNT being based on TRILL, it is impossible to interconnect multiple data center without merging their control plane into one, resulting in losing each data center independence and increasing the broadcast domain. To prevent the merging of TRILL network when being interconnected and to keep each data center control plane independent, the Multi-Level TRILL Protocol with VNT (MLTP/VNT) solution has been developed. This solution, describe in , mostly improve TRILL scalability, thus allowing for a better use of VNT. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ethertype = TRILL | V | R |M|Op-Length| Hop Count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Egress RBridge Nickname | Ingress RBridge Nickname | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Crit.| CHbH | NCHbH |CRSV | NCRSV | CItE | NCItE | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Flow ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |APP|*| Type |#| Length | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VNI | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * = NC # = MU 1. X bits for a tenant’s id 2. 48 - x bits for the VM’s id Figure 24: VNT Header 5.2.2 ernet over IP (Ethernet over GRE (Section 188.8.131.52) or EtherIP  protocol) to transport data between the cloud edge switch and the CE. This OTV-like protocol uses a control plane protocol to exchange MAC addresses among sites which eliminates the cross-site MAC flooding. To suppress this flooding without modifying the VMs behavior, the hypervisor is tasked to intercept all the VMs’ DHCP and ARP messages. In VSITE there is no global VLAN ID assigned to enterprises but rather, local VLAN IDs at the data center edge location (between the switch and the VMs connected to it). The VLAN IDs are not statically assigned to enterprises. Therefore, the cloud edge switch has to map the VLAN ID of the enterprise to a locallysignificant unique VLAN ID for the traffic from the enterprise’s network to the VMs. The reverse operation has to be done at the hypervisor for the traffic from the VMs to the enterprise’s network. To ensure isolation of tenants’ data, VSITE encodes the tenants’ ID in the MAC addresses. The hypervisor will then ensure the traffic isolation by verifying that the VM receiving the packet belongs to the enterprise through the tenant ID in the MAC address. The hypervisor, by checking the tenant id, must either accept or drop the packet. The MAC address (48 bits long) is divided in two: Core isolation for both Layer 2 and Layer 3 Cloud DC The protocol introduced in this section uses the Core isolation technique and works over a Layer 2 and/or Layer 3 network. 184.108.40.206 VSITE In , the VSITE architecture is proposed in order to allow enterprises (companies) to have seamless Layer 2 extensions in the cloud. In the paper, the tenants are considered to be exclusively enterprises that need to expand their networks. VSITE defines this extension as the collection of resources of an enterprise within a data center, and called it a virtual stub network, or vstub. The enterprise customer edge (CE) switch communicates with the cloud edge switch to exchange MAC information via an OTV-like protocol. For communication over the public network, VSITE uses Eth- Where X value is the administrator’s choice. The core network of the data center can be a Layer 2 or a Layer 3 network. In a Layer 2 network situation, the MAC-in-MAC encapsulation technique allows for a location MAC address (locMAC). With a Layer 3 network, the packet is encapsulated in an IP packet with a location IP address (locIP). The locIP or locMAC are location addresses assigned to a VM. Each VM possesses a location address which allows the separation of its name and location. The name of the VM is the IP address assigned by the enterprise, the enterprise IP (entIP). The location of the VM is indicated by the IP address, or the MAC address, of the switch to which the VM is logically connected. However this logical connection must be done via Ethernet. This location address is used to route the packet in the core network. All locIP (or locMAC) addresses are stored in a directory server. Because of this, a VM or data center edge has to send a lookup request to the directory server to retrieve the locIP of the destination if the information is not already in its local cache. The directory server maintains the mapping between entIP and pertinent information including locIP, MAC, and potentially, a VLAN ID. The VSITE architecture has a centralized control plane and relies on hypervisor security to provide protection against MAC address spoofing, a VM impersonating a different VM, or a DDOS attack from a VM. It also uses Core isolation protocols in both its edge domains (VLANS) and its core network (MACin-MAC or IP-in-IP). However, data transported over 20 the public network is not protected. by the VTEP. Both headers are removed at the egress VTEP. One advantage of VXLAN is that it expands the 5.2.3 Core isolation for Layer 3 Cloud DC VLAN technology with a larger number of VXLAN The protocols introduced in this section use the Core possible. On the other hand, VTEPs must not fragisolation technique and require a Layer 3 network. ment encapsulated VXLAN packets and if such a packet has been fragmented along the path it must 220.127.116.11 VRF: Virtual Routing and Forwarding be silently discard by the egress VTEP. As UDP is Virtual Routing and Forwarding (VRF), described in used to encapsulate the data and that a packet must , is a technology included in routers. This tech- be discarded silently, the source does not know that its nology allows a router to have multiple instances of a packet has been discarded. There is no security mearouting table to exist and work at the same time. With sure so it is recommended to use IPSec to add security these multiple routing tables it is possible to segment mechanisms. the network and separate users in different routing ta- 0 1 2 3 ble instances. The Customer Edge (CE) router does 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 Ethernet header:(144 bits) the local routing and then exchanges the information Outer Outer IP Header: (IPv4=160 or IPv6=320 bits)... with the Provider Edge (PE) router. The PE router Outer UDP header: (64 bits) header: (64 bits) creates a VRF instance with the information from this VXLAN +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CE. This new VRF instance is used by the PE for ev- |R|R|R|R|I|R|R|R| Reserved | ery packet to and from this CE. For each CE corre- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | sponds a VRF instance in the PE. This allows for the +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ creation of a Virtual Private Network (VPN). The PE Flags (8 bits)- where the I flag MUST be set to 1 for a valid router creates at least one routing table instance for VXLAN Network ID (VNI). The other 7 bits (designated "R") are fields and MUST be set to 0 on transmit and ignored each VPN. Then the PE routes the packet from each reserved on receive. VPN by searching in the corresponding VRF instance. Segment ID/VXLAN Network Identifier (VNI) - this is a 24-bit The separation of the VRF instances provides a secure VXLAN value used to designate the individual VXLAN overlay network access to all the devices in a specific VRF instance. on which the communicating VMs are situated. VMs in different It also allows the use of the same or overlapping IP VXLAN overlay networks cannot communicate with each other. addresses without conflict. The VRF method is in- Reserved fields (24 bits and 8 bits) - MUST be set to 0 on tended to help the ISPs provide private networks to transmit and ignored on receive. their clients while using the same physical infrastrucFigure 25: VXLAN header ture. These private networks, over shared infrastructures, are called Virtual Private Networks (VPNs). It is as if the whole network of each client is tunneled. This solution is based on MPLS for the routing of the 6 Network isolations provided packets in the core network. It is not used in data by several cloud tools centers. 18.104.22.168 VXLAN: Virtual eXtensible Local Area Network Virtual eXtensible Local Area Network (VXLAN), detailed in "draft-mahalingam-dutt-dcops-vxlan-09" , is being developed in order to expand the VLAN method and remove the VLAN limit of 4096. VXLAN allows overlaying a Layer 2 network on a Layer 3 physical network. Such tunnels begin and end at VXLAN Tunnel EndPoints (VTEPs). The ingress VTEP encapsulates the packet and adds a VXLAN header of 8 bytes to the packet. The header (Figure 25) is composed of 8 bits of Flags, 1 bit for each flag, with the I flag, the 5th one, set to 1 for a valid VXLAN Network ID (VNI). Then comes a 24-bit long reserved field. It is followed by the VXLAN Network Identifier (VNI) field, 24 bits long, used to identify which individual VXLAN overlay network the packet belongs to. Finally the header ends with another reserved field with a length of 8 bits. Once the VXLAN header is added, the packet is then encapsulated within a UDP header. This UDP header must use the value 4789 as destination port. This value has been assigned by the IANA as the VXLAN UDP port. The source port is provided This section regroups a succinct list of several tools used in cloud deployment which provide network isolation while relying on already established tunneling protocols. 6.1 Cisco guide In  tenant isolation is done thanks to path isolation and device virtualization. Path isolation is as if the packet from this path went through a tunnel over the network (Figure 3). The device virtualization is achieved by creating virtual devices such as VMs or virtual switches. Path isolation is done thanks to several techniques which provide an independent logical path over a shared infrastructure. To create these paths, two technologies, over different layer, are mainly used: 1. Layer 2 separation with VLAN 2. Layer 3 separation with VRF Separation on the Layer 2 is done thanks to Virtual Local Area Network (VLAN) presented in Section 22.214.171.124. 21 At the Layer 3, the separation is done with Virtual Routing and Forwarding (VRF) presented in Section 126.96.36.199. This solution works for small multi-tenancy clouds, with less than 4096 tenants, which corresponds to the VLAN limit. However in today’s cloud data center with virtualization, there is a need to host more than 4096 tenants so this solution is not scalable enough. 6.4 Amazon’s Virtual Private Cloud (VPC) Amazon’s VPC  is a proprietary solution so we only have scarce information. It provides a Layer 3 abstraction with a full IP address space virtualization. It is possible to create up to 200 sub-networks per VPC and to have up to 5 VPCs per customer. In order to comIn  tenant isolation is done with additional logi- municate with Amazon’s VPN you must have: cal mechanisms such as virtual Network Interface Con• The ability to establish IKE Security Association trollers (vNICs), IPsec or SSL Virtual Private Netusing Pre-Shared Keys (RFC 2409) . works, packet filtering, and firewall policies. • The ability to establish IPSec Security AssociaThe Security Architecture for the Internet Protocol tions in Tunnel mode (RFC 4301) . (IPsec) is a security architecture focused on IP-Layer security, as mentioned in "RFC 4301" . The Secure • The ability to establish Border Gateway Protocol Sockets Layer (SSL) Protocol is designed to provide (BGP) peering (RFC 4271). privacy and reliability between two communicating applications as described in "RFC 6101" . • The ability to utilize IPSec Dead Peer Detection (RFC 3706) . 6.2 • The ability to adjust the Maximum Segment Size of TCP packets entering the VPN tunnel (RFC 4459) . Openstack software Openstack manages tenant isolation thanks to the use of VLANs or Layer 2 tunneling with GRE, as stated in the OpenStack Security Guide . For the Layer 2 tunneling, OpenStack Networking currently supports GRE (Section 188.8.131.52) and VXLAN (Section 184.108.40.206). However the support of VXLAN is added via the Neutron Modular Layer 2 (ml2) plugin. The security guide explains that "the choice of technology to provide L2 isolation is dependent upon the scope and size of tenant networks that will be created in your deployment". Indeed if the environment has a large number of Layer 2 networks (ie. more than 4096 tenants with each their own sub-network), the VLAN limit could be reached and no more tenants could be added. Therefore it is better to use the tunneling method with GRE or VXLAN. • The ability to reset the "Don’t Fragment" flag on packets (RFC 791) . • The ability to fragment IP packets prior to encryption (RFC 4459) . However we do not know what happens inside the VPC. We could not find any documentation of how it is implemented either, and hence cannot comment on its isolation techniques. 7 Comparison In this section, a comparison between fifteen of the solutions (protocols and architectures) previously introduced in Section 5 will be presented. The solutions being compared are: 1. LISP, 2. NVGRE, 3. STT, 4. 802.1ad, 5. 802.1ah, 6. VXLAN 7. Diverter, 8. Port6.3 OpenFlow controller Land, 9. SEC2, 10. BlueShield, 11. VSITE, 12. NetLord, 13. VNT, 14. VL2, 15. DOVE. This comparison is made by using six criteria. The first criterion is the OpenFlow  has a centralized control plane and uses complexity of use of the solution. The second is the an OpenFlow controller. This OpenFlow controller has overhead induced by each solution. Then we compare a global view of the network and decides what is best the solutions’ capability to migrate VMs, followed by for the network. It then sends its decisions to the com- a comparison of their resilience. The fifth criterion is patible OpenFlow network switches. These switches scalability, and finally, we study if it is possible and can be hardware or software. Belonging to this last easily manageable to have multiple data centers. category, Open vSwitch is a virtual switch for hypervisors. It provides network connectivity to virtual machines. Open vSwitch works via a flow table which 7.1 Complexity comparison defines rules and actions for each flow. To determine the complexity of a technique we take In order to isolate the flows from each tenant, Open- into account six criteria: Flow uses VLANs (Section 220.127.116.11) and GRE (Sec1. Centralized/Distributed control plane tion 18.104.22.168) as a tunneling protocol. However Open vSwitch can also manage the Stateless Transport Tun2. Network restrictions neling (STT) protocol (Section 22.214.171.124) inherited from 3. Tunnel configuration and establishment Nicira development. 22 4. Tunnel management and maintenance 5. Multi-protocol 6. Security mechanism Table 2 summarize this comparison. 7.1.1 Centralized/Distributed control plane The first criterion is if the technique has a centralized control-plane or not. Among the presented solutions, eight of them have a centralized control-plane. These solutions are LISP, PortLand, SEC2, BlueShield, VSITE, NetLord, VL2 and DOVE. They all possess a key component which is used to resolve addresses. PortLand, SEC2, BlueShield, VSITE, VL2, and DOVE architectures use this centralized controller to also maintain the rules allowing tenant traffic isolation. However these architectures mitigate the consequences of a failure of this key component through replication. This replication increases complexity because those replicas must all possess the same information, which implies a synchronization of these components. The other architectures do not possess such a key component and possess a distributed control-plane. The failure of a single device will not compromise the entire architecture. However they need to store redundant, and sometimes unused, information in every switch or VM to do the address resolution. Whereas with a centralized control plane approach the switches only get the information they need from the key component on an on-demand basis. The centralized control plane design also has other issues. For example, with the PortLand architecture, in a data center with 27.648 end hosts (not tenants), and each host makes 25 ARP requests per second, the fabric manager will need approximatively 15 CPU cores working full-time just to manage the ARP requests. In addition, in PortLand, the fabric manager only manages ARP requests, whereas the other architectures, which also use a centralized controller, additionally manage rules and policies, which increases the workload of this component. To mitigate this increased workload, redundancy of the centralized controller and local caches in switches are used. There is a need for synchronization between all these components. To prevent data routed with outdated information from the local caches to attain a tenant network, SEC2, Blueshield, VSITE and VL2 use local modules (FEs, Echelon VMs, Hypervisors, and VL2 agents respectively), to enforce rules, and to drop packets that do not conform to these rules. 7.1.2 Network restrictions Only three solutions, 802.1ad, 802.1ah, and VSITE, do not impose restrictions on the underlying network. Then there are architectures that impose a Layer level on the network such as SEC2 and BlueShield which need a Layer 2 network, or DOVE and LISP which need a Layer 3 network. Three architectures need a specific topology such as a Layer 2 multi-rooted fat tree topology for PortLand, a Layer 3 Clos topology for VL2 and a flat Layer 2 network for Diverter. Two protocols (NVGRE and VXLAN) require that the Layer 3 underlying network does not fragment packets and for VXLAN that there is an IGMP querier function enabled. For NetLord, the network must be a Layer 2 network but with edge switches supporting IP routing, which is why NetLord is put as a Layer 2-3 architecture. VNT is using the TRILL header so it needs the edge switch to support the TRILL protocol and a TRILL or Layer 2 core network. Finally STT is the one that is the most tricky, as it only needs a Layer 3 network but it uses a modified TCP header. STT needs to be allowed to transit in all the middle boxes of the network. In this category we started with solutions that do not impose restrictions on the underlying network, continued with architectures needing a specific Layer level, and finished with architectures needing a very specific topology or protocol. In order to use these latter architectures, the underlying network might have to be heavily modified, which could be difficult or even impossible to do on already established infrastructures. 7.1.3 Tunnel configuration, establishment, management and maintenance BlueShield, Diverter, and PortLand are exceptions because they do not use encapsulation protocols. Instead, to provide isolation, BlueShield uses a Directory server and ARP requests to verify whether a VM can communicate with a different VM. If communication is permitted in the Directory server rules, then the address resolution is possible and the directory server responds to the ARP request. If not then the directory server does not reply and the address resolution fails, therefore data is not sent between the VMs. For Diverter the VNET replaces the VM’s MAC address with the server’s MAC address, and verifies whether the communication is allowed. Eight solutions (LISP, NVGRE, STT, VXLAN, SEC2, NetLord, VNT and VL2) among the other twelve use an implicit tunnel configuration and establishment for the core network part. They do not exchange messages or make reservation to establish the tunnel. The others three use an explicit tunnel configuration. Those three solutions are the same three which does not impose restriction on the network. 802.1ah and 802.1ad both use the GARP VLAN Registration Protocol (GVRP), which is a GARP application, in order to distribute the VLAN registration information in the network. The last of the three, VSITE, uses the MPLS VPN protocol on the public network, which uses an explicit tunnel configuration. Even if tunnel establishment could potentially be implicit, for twelve of the solutions, tunnel maintenance and management must still be explicit. LISP uses addresses mapping in its ITR (Ingress Tunnel Router) and ETR (Egress Tunnel Router). 802.1ad and 802.1ah both use join and leave messages sent by end stations 23 and bridges. VXLAN also uses join and leave messages. However they are sent by VTEPs with the goal of keeping the distribution tree of each VNI updated to reach all the clients of this VNI. Diverter, SEC2, BLueShield, VSITE, and VL2 each maintain rules for isolation which need to be managed and updated according to tenant requirement, or in case of VM migration. To enforce these rules, they all defined agent modules detailed in Section 7.2.3. PortLand uses soft states to maintain the tunnel and VNT uses temporary forwarding database in its RBridges. NetLord uses a SPAIN agent to manage the tunnel. The two which do not have tunnel management are NVGRE and STT. For DOVE, tunnel management is dependent upon the encapsulation protocol. 7.1.4 Multi-protocol and security mechanism Only PortLand and VNT are multi-protocol. NVGRE and BlueShield accept protocols from the second Layer because they use the MAC address to forward the packet at the last hop to the correct VM. STT, 802.1ad, 802.1ah, VXLAN, VSITE, and NetLord solutions also use the MAC address for delivering the packet but the protocol must be Ethernet. LISP, Diverter, SEC2 and VL2 use the IP address instead of the MAC to deliver packets, therefore the protocol must be IP. About security mechanisms, these architectures do not define any encryption, authentication or integrity verification techniques. However five of them (Diverter, SEC2, BlueShield, VSITE, VL2) have security mechanisms for tenant isolation thanks to their agents and directory services which enforce pre-established isolation policies. In BlueShield, the Echelon VM (the agent) can also be associated with a firewall to improve security by processing the traffic only after it traverses the firewall. NVGRE has its own header (64 bits) and then encapsulates the packet within an outer IPv4 header (160 bits) and an outer MAC Ethernet header (144 bits), making a total of 368 bits in IPv4 or 528 bits in IPv6. STT encapsulates Ethernet messages with a STT header(144 bits) then with a TCP-Like header(192 bits), an outer IP header(IPV4: 160 bits, IPV6: 320 bits) and finally an outer Ethernet header (144 bits) for a total cost of 640(IPv4) or 800(IPv6) bits. VXLAN also defines its own 64-bit header for encapsulation, and does not rewrite addresses. 802.1ad modifies the MAC header by adding an STAG of 32 bits, after both MAC addresses, followed by a C-TAG of 32 bits. After these modifications, the header size increased by 64 bits. The 802.1ah header is increased by a complete MAC header. As such we have a MAC destination (64 bits), a MAC source (64 bits), a B-TAG (32 bits), and an I-TAG (48 bits) for a total of 176 bits. However it is possible to use the 802.1 standard with 802.1Q frames or with 802.1ad frames. In the first case we have to add the 802.1Q header which is 32 bits long and so the new header is 208 bits long. In the second case, the new header is increased by the S-TAG and the C-TAG from the 802.1ad standard and is now 240 bits long. DOVE does not specify an encapsulation protocol but proposes to use one of the three previously presented protocols (NVGRE, VXLAN, or STT). SEC2 and VSITE use MAC encapsulation with a header of 18 * 8 = 144 bits. In addition VSITE changes the destination address with a locMAC address. However VSITE can be deployed over a Layer 3 network so instead of MAC encapsulation it can use IP encapsulation (160 bits in IPV4, 320 bits in IPV6) and replace the destination address with a locIP address. VL2 also uses an IP encapsulation and rewrites the destination address with a LA (Location Address). As VNT is deployed 7.2 Overhead comparison over a TRILL or a MLTP network, it uses a modified To determine overhead we list the encapsulation head- version of the TRILL header. This modified version is ers, messages, and components used by each architec- 192 bits long. NetLord uses both MAC and IP headers ture. All these elements are summarized in Table 3. to encapsulate the data, which creates an overhead of 304 bits in IPv4 and 464 bits in IPv6. NetLord is the solution which has the largest header overhead. 7.2.1 Encapsulation Among the fifteen solutions presented, one (BlueShield) does not use any encapsulation or address rewriting, two (Diverter and PortLand) do not use encapsulation either, instead rewriting the address of the packet. For Diverter, the MAC address of the virtual machine is replaced by the MAC address of the physical remote node. In PortLand, the Actual MAC (AMAC) is replaced by the Pseudo MAC (PMAC). The others use encapsulation. 7.2.2 Messages Both NVGRE and STT do not exchange messages. This is explained by the fact that these two solutions are tunneling protocols and they only define an encapsulation technique. Then, BlueShield only uses lookup requests, from the BlueShield agent to the Directory Server, to resolve addresses and suppress ARP broadcasts. Diverter does not provide a specific type of message, and resolves addresses using the ARP protocol. LISP defines its own header of 64 bits and also uses both UDP and IP headers as outer headers. So when 802.1ad and 802.1ah both use the Generic Attribute the packet enters a LISP tunnel, a header of 288 bits, Registration Protocol. VXLAN requires its VTEP to for the IPv4 version, or a header of 448 bits, for the manage the distribution tree of each VNI by sending IPv6 version, is added to the packet. join and leave messages for each VNI. 24 VNT is based over a TRILL or a MLTP network, which implies the use of TRILL messages. As the Control plane in TRILL and in MLTP is based on the IS-IS protocol, in order to route frames it uses SPF (Short Path First) tree topology generated by Link State PDU (LSP) messages. DOVE, like all the other solutions with a centralized control plane (PortLand, SEC2, BlueShield, VSITE, NetLord, and VL2), must store the mapping between VMs’ addresses and dSwitches’ addresses in the DOVE Policy Service (DPS). All the servers of the DPS have to exchange information to be synchronized. DOVE dSwitches must also make unicast requests to retrieve the information from the DPS. answer one of the lookup requests, it must broadcast an ARP request to all end hosts. In addition, PortLand switches periodically send a Location Discovery Message (LDM) out of all their ports, both to identify their position, and to perform health checks. Finally, it is possible to have fabric manager synchronization messages in the case of redundant fabric managers. LISP uses five different messages. The LISP MapRequest is used to request a mapping for a specific EID, to check the reachability of an RLOC, or to update a mapping before the expiration of the TTL. However in , it is RECOMMENDED that a Map-Request for the same EID-Prefix be sent no more than once per second. The second message is a LISP Map-Reply which is the answer to the LISP Map-Request. This message SEC2 needs communication between Forwarding Elreturns an EID-prefix whose length is at most equal ements (FE) and the Central Controller (CC) to first to the EID-Prefix requested. The LISP Encapsulated save the mapping between the VM addresses and the Control Message(ECM) contains the control packets of FE addresses. Then the FEs have to intercept the the xTRs and also the mapping database system from ARP requests of the VMs and convert these to uni. Also defined in , the messages LISP Mapcast lookup requests sent to the CC. However, it is Register and LISP Map-Notify are used to manage the possible that there are more than one CC which must xTR/EID-Prefixes associations. possess the same information. This implies the need for synchronization messages between CCs. The other possibility is that every FE sends information to all 7.2.3 Components CCs. Additionally, SEC2 uses VLAN, so it needs to Every solution presented in this survey carries at least use the Generic Attribute Registration Protocol. one new networking dependency. Only five of them NetLord is based on the Diverter model for resolving define exactly one component. LISP needs an xTR addresses, but it also needs unicast messages between component. This component is the device at each end SPAIN agents and the repository to obtain the table of the tunnel. It must function both as an Egress Tunthat maps destinations and sets of VLANs. In case of a nel Router (ETR) and as an Ingress Tunnel Router topology change, new messages must be sent to update (ITR). Additionally the xTR needs to work as a Proxy ETR (PETR) and as a Proxy ITR (PITR) in order this table. to connect LISP to non-LISP sites. Diverter introVL2 uses registration messages to store the associa- duces VNET, a software module which resides within tion between AAs and LAs in the directory system. To the host OS on each physical node. Then NVGRE, resolve addresses the VL2 agents send lookup requests VXLAN, and STT need NVGRE Endpoints, VXLAN to the directory service. However there might be mul- Tunnel Endpoints (VTEP), and STT Endpoints. All tiple directory services so they must be synchronized. three are modules in switches, servers, or hypervisors, In addition, for LA address assignment, VL2 uses an which encapsulate and decapsulate the packets. IP-based link state routing protocol. With two new components each, PortLand, VNT, VSITE uses an OTV-like protocol to exchange MAC VL2, and DOVE belong to the same group. Portland, addresses between CEc (Customer Edge cloud), CEt VL2, and DOVE use a centralized controller. Port(Customer Edge tenant) and the directory server. This Land defines a Fabric Manager, VL2 a Directory Sysprotocol allows the elimination of the cross-site MAC tem, and DOVE a DOVE Policy Service. To apply the learning flooding. As an architecture with a centralized rules stored in these centralized controllers, PortLand control plane, VSITE has to perform Directory lookups uses edge switches which must be able to perform MAC for address resolution. For these queries, unicast mes- to PMAC header rewriting. VL2 defines an VL2 agent sages are sent from the CEc or VSITE agent to the added to the hypervisor and DOVE defines dSwitches Directory server. For address resolution between CEt which are the edge switches of the DOVE overlay netand CEc they exchange MAC reachability information work. On the other hand, VNT does not use such a cenusing OTV control plane protocol. tralized controller. Instead VNT uses RBridges, which provide the advantages of Layer 2 (Bridges), Layer 3 PortLand has four functionalities that need mes(Routers), and Virtual Switches(VS). A VS is dedisages. The first is the registration of new source MAC cated to host all interfaces tagged with a particular address, as seen at the ingress switch, at the fabric VNI corresponding to a tenant ID. manager to save the mapping between the PMAC, the MAC, and the IP addresses. To resolve addresses, SEC2 and VSITE define three components each. PortLand intercepts the ARP requests of the VMs and SEC 2 uses a Central Controller (CC) but this deconverts them in unicast lookup requests to the fab- vice could possibly be on several servers for redunric manager. However, if the fabric manager cannot dancy or load balancing. To enforce the rules of the 25 CC, Forwarding Elements (FEs) are introduced. A FE is a switch that intercepts ARP requests from VMs and encapsulates data if necessary. The third component is a web portal, where each customer can set up security policies for their network, which then translates them into policy settings saved in the CC. VSITE uses a Directory server to save the addresses associations. It presents a component called CEc (Customer Edge cloud) which is in the cloud data center. This CEc encapsulates the Ethernet frames received from the tenants’ private networks with an IP header. The IP destination address is the address of the Top of the Rack switch which hosts the Ethernet frame’s destination device. This device is in the tenant vstub. This CEc prevents the overlapping of VLAN IDs from multiple companies by translating this VLAN ID into a locally unique one. For an Ethernet frame from cloud VMs, the translation is done at the VSITE agent in the hypervisor. The Ethernet frame is then encapsulated with an outer IP header. are interested in here is the live migration which allows for a continuity of service and session even while the VM is being moved. We summarize this comparison in Tables 4 and 5. LISP RFC  defines five types of mobility, however only three of them concern endpoint migration: 1. Slow endpoint Mobility: An endpoint migration without session continuity uses "RFC 4192" . 2. Fast Endpoint Mobility: An endpoint migration with session continuity. 3. LISP Mobile Node Mobility: An xTR migration. Among these three types of mobility, only the last two are of interest to us for this comparison. For the Fast Endpoint Mobility, the solution is to use the technique of home and foreign agents. The home agent, the endpoint original agent, redirects traffic to the foreign agent of the network to which the endpoint moved. This technique is defined in "RFC 5944"  for IPv4 NetLord uses NetLord Agent (NLA) implemented and in "RFC 6275"  and "RFC 4866"  for IPv6. at each physical server to encapsulate the data with However the last migration, the LISP mobile node moan IP header and then with an Ethernet header. To bility, allows the migration of device without the need do load balancing when sending packets, the NLA uses of agents. As the device is itself an xTR, it can use a SPAIN agent that is implemented in the NLA. The topologically independent EID IP addresses. Thus it third component, an edge switch, is not really a new only has to register itself at the MAP-servers and Mapone. However, this edge switch must be able to read the Resolvers of the network. This last solution is exIP header of the packet. And the last new component plained in . NVGRE is an encapsulation and tunneling protocol. is a configuration repository which maintains all the Its original goals were to increase the number of VLAN configurations of the tenant virtual networks. subnets; the VLAN technology being limited to 4096 As the other architectures with a centralized control subnets, and to achieve a multi-tenant environment. plane, BlueShield uses a Directory Server and an agent,  states that NVGRE achieved its goals. However, called BlueShield Agent, to enforce the rules of the DiNVGRE left the management of VM migration to IP rectory server. For security measures, an Echelon VM because of its use of a UDP header. In order to imis introduced and is in fact a VM that scans the traffic prove the management of VM migration, the draft  to apply added actions such as sending the traffic flow defines new extensions for the control plane of NVGRE. through a firewall. To suppress ARP flooding, a virtual Among these extensions, one is interesting for host miswitch is installed in the server and converts the ARP gration. The REDIRECT message is in fact the origirequests to unicast directory lookups. An additional nal message, or at least the maximum data of the origicomponent, ’ebtables’ firewall, has been used to block nal message a REDIRECT message could contain, sent all broadcast and multicast traffic. back to the sender. This REDIRECT message is sent Both 802.1ad and 802.1ah require that all the de- by the old NVE, where the endpoint was hosted before vices of the network adhere to their respective stan- migrating. The data of the returning packet starts with dard. These two solutions being IEEE standards, the the address of the new NVE managing the endpoint. devices are not modified by the network administrator This address is 32 bits long and is the first information but by the manufacturers of these devices in order to in the payload of the packet. Then follows a copy of comply with the standard. as much data as possible of the original message. This way the sender now knows the address of the new NVE. However it is not specified how long the old NVE must 7.3 Migration of VM comparison maintain the information of the VM migration. The migration of a VM is an important task in a virAs NVGRE, STT is an encapsulation and tunneltualized data center. When a server needs to be shut ing protocol. However as opposed to NVGRE, there down for maintenance, the VMs on this server must are no STT mechanisms for VM migration. STT is not be stopped so they have to be moved to another working with IP and it uses IP mechanisms to manage server. If a client’s location changes, it might be interVM migration, but it must also use the IP mobility esting to move their VM accordingly. VM migration mechanism of STT. can be done in two different ways, an offline migration 802.1ad and 802.1ah manage VM migration the same or a live migration. The offline migration will stop the service by terminating the session and establishing a way thanks to the GARP VLAN Registration Protonew session once it has finished migrating. The one we col (GVRP). When a VM moves, it must send a GVRP 26 message to the closest switch in order to indicate that the VLAN announced in the message is of interest for this machine. This way the VLAN tree will reach the VM. As the VM does not change its IP address, the connection is not lost. However, in order to keep the connection, the VLAN must be deployed in the destination device ahead of time in order to already have the distribution tree of this VLAN reach this device. Even with this advance deployment the migrating VM must stay in the same Layer 2 network. VXLAN is a tunneling technique that allows a VM to migrate even to another network across a Layer 3 network. To do so, VXLAN uses join and leave messages destined to VTEP in order to indicate which distribution tree the VTEP must associate with. As for 802.1ad or 802.1ah, in order to have session continuity, the destination VTEP must be informed ahead of time that it must join the distribution tree requested by the migrating VM. Diverter was designed to increase isolation between tenants’ networks without degrading the overall performance of the network. A VM uses a virtual IP address which is created based on the Farm and Sub-network it belongs to. This IP address is formatted as follows:10.Farm.Subnet.Host. So in Diverter, all the VMs of one client belongs to the same sub-network and this sub-network is in one Farm. If a VM migrates to another server it means that this new server will now have to extend the Farm and the Sub-network. However, to discover the mapping between IP and MAC addresses, the VNET ARP engine uses multicast ARP, so the migration of the VM is not detected at the beginning of the migration, but only when the VNET ARP engine sends an ARP query, and the response has been received from the new server. If an existing connection was established between the VM pending migration and another VM, this connection will be interrupted at the beginning of the migration. The VNET of the non-migrating VM will continue to associate the MAC address of the old server, where the migrating VM was hosted, to the traffic of this session. This traffic will be lost until the VNET ARP cache entry times out and the VNET does an ARP query to retrieve the new MAC address. Nevertheless, since the IP address stays the same there could be no interruption of session even if, during the migration and until the VTEP learns the new MAC address, the traffic is lost. The session continuity depends on two parameters: The ARP cache entry timeout, and the TCP timeout. If the first one is longer than the second, then the session is lost and there is no live migration. On the other hand, if the TCP timeout is longer than the ARP cache entry timeout, the new MAC address will be retrieved before the end of the TCP session thus having session continuity and live migration. migration. The first message is a gratuitous ARP, sent by the migrated VM, which contains the new IP to MAC address mapping. It is forwarded to the fabric manager which then forwards an invalidation message intended to the old switch of the migrated VM. Upon reception of this message, the old switch sets up a flow table entry to trap the packets destined for the VM which has migrated. Additionally, when such packet is received at the old switch, it sends back a unicast gratuitous ARP to give the new PMAC address of the migrated VM. Optionally, to limit the loss of packets, the old switch can transmit the trapped packet to the VM. As SEC2 architecture is composed of multiple edge domains, where the VM are hosted, and one core domain, which interconnect these edge domains, there are two ways a VM can migrate. First the VM stays in the edge domain. The migration consists of transferring a dynamic VM state from a source to destination hosts. Once the transfer is complete, a gratuitous ARP message is sent by the destination host to announce the VM’s new location. This ARP message is sent only within the VLAN inside the edge domain, and can only reach hosts in this VLAN. However, if the VM migrates to a different edge domain, then the Central Controller (CC) has to update the VM’s location in its table, including both eid and VLAN id. Since the IP address is not modified and the MAC address change is induced by the gratuitous ARP, then the migration is done without losing the session continuity and so SEC2 can perform live migration. In , it is said that BlueShield allows live migration of protected VM. It is possible because BlueShield uses a Layer 2 core network and addressing scheme. As the IP address of the VM is untouched, the continuity of the session is preserved. However the process of migrating a VM is not clearly defined in the solution. We can guess that as each VM need to have a BlueShield agent which manages ARP queries, this same agent must warn the directory server of the migration of the VM and provide the new MAC address. This way the directory server informs the other VMs of the new address of the migrated VM. The echelon VM could manage the current traffic address replacement. VSITE manages VM live migration thanks to the MAC learning mechanism and by using a location IP address, which is the IP of any Layer 3 switch that interconnects data center edge and core. As such, a VM migration can be considered "live" if it takes place inside one data center edge. This migration does not modify the IP address of the VM because the VM is still connected to the same Layer 3 switch and no routing updates are required. However, if the VM does migrate to another Layer 3 switch, then the location IP address is changed and the migration is no longer "live". PortLand defines Layer 2 messages to be sent when Thus the directory service, both edge routers, and the a VM migrates. Additionally, VMs’ IP addresses reserver’s hypervisor configuration must be updated. main unchanged during the migration. Thus, PortLand manages live migration as well as session conWhen a VM starts or migrates in NetLord, the Nettinuity. These messages are sent only after the VM Lord agent (NLA) in the hypervisor of the correspond27 ing server will have to broadcast a NLA-HERE message to report the location of the VM to the other NLAs. The NL-ARP table entries are permanent so only one message is sufficient to update the ARP-table. However if the broadcast is lost, the ARP-table does not have the correct information. Additionally, if packets for the migrated VM are already sent, then upon arrival of those packets, the server, which does not host the VM destination any more, has to reply with an unicast NLA-NOTHERE message. When receiving a NLA-NOTHERE message, the NLA will broadcast a NLA-WHERE message in order to retrieve the correct MAC address for the migrated VM. The IP address of the VM remains unchanged throughout the migration so the session remains uninterrupted. In this way, NetLord can do live VM migration. VNT is based on the TRILL protocol which routes the messages thanks to Layer 2 nicknames. A VM is associated to a RBridge nickname in the core network. A message for a VM is modified by the ingress RBridge which routes the frame to the egress RBridge associated with the destination VM. When a VM migrates in TRILL the only modification is the association between an RBridge nickname and the VM, except if the VM remains in the domain managed by the RBridge. As such, the IP address and MAC address of a VM is not used for routing or forwarding purposes, so they remain unchanged during a VM migration thereby preserving the session continuity and realizing a live migration. Huawei, in  even qualify the migration of VM with TRILL as "Smooth VM migration". 7.4 Resilience comparison In this section we look at techniques such as redundancy, multipath and backup that the solutions provide in order to manage failures. LISP resilience is done through redundancy of its components. More than one CE (Customer Edge) router with LISP capabilities can be used which translates to more than one xTR with the the same IP address. Thus the RLOC becomes an anycast address and if one of the xTR fails then the traffic is automatically routed to the other with the same address. To manage these redundant xTRs, we have two arguments. First is priority; the higher the priority, the less favorable. Second is weight; if two xTR share the same priority then the traffic is divided according to their weight. For example, if xTR1 has a weight of 10 and xTR2 a weight of 5, then the traffic ratio will be 2:1 with xTR1 receiving the double of traffic than xTR2. Additionally, Mapping Server (MS) and Mapping Resolver (MR) are key components. To assure resilience, backup devices may be needed. When using multi-homed sites, with multiple xTR, it is no longer possible for the site to control its point of entry when the anycast route is advertised. The scope of advertisement is also reduced to /32 (/128 in IPv6) prefixes. NVGRE is an encapsulating and tunneling protocol. There is no resilience in NVGRE because the sole function of NVGRE, encapsulation, is done by an important element, the hypervisor. In the event of failure of this element the packet could not reach the destination because it could not pass the hypervisor. It is possible to add resilience on the path by using multipath techniques such as ECMP (Equal-Cost Multipath) or  VL2, like TRILL, uses an addressing scheme which but this is not included in NVGRE. separates the server address and addresses used for STT’s use of a TCP-like packet but lacking all the routing purposes. The server addresses are called TCP functionalities results in the loss of IP datagrams application-specific addresses (AAs) and are not modiin the event of congestion, or when a router on the path fied when a VM migrates. The modified address is the is not STT-enabled. In this case the router will drop location-specific address (LA) which is the one used the packet. In order to prevent such an undetected for routing the packets. Each AA address is associated packet loss, the solution is to use a real TCP header with a LA address and it is the VL2 directory sysin the outer IP header. As with NVGRE, it is possible tem which manages those associations. When a VM to use ECMP in addition to STT for better path remigrates and changes its AA/LA association, it is the silience. However there is a necessity that all packets of directory system which must update the mapping and the same flow follow the same path and that all paths thus must inform the other VMs which want to commuare used efficiently. STT endpoints are designed to be nicate with the migrated VM. As during the migration virtual switches running in software. These endpoints process, if neither the IP nor the MAC address of the are mostly inside the servers and so each server is an VM changes, then the session was uninterrupted and endpoint. There is no need for endpoint redundancy the migration was performed live. since if the endpoint is down then the server is down, and so are the VMs. DOVE works with tunneling protocols such as STT, The resilience in 802.1ad and 802.1ah can be obVXLAN, and NVGRE. These protocols decouple the tained by aggregating links. If one link is down conneclogic domain from the physical infrastructure while retivity is maintained by the remaining backup links. In specting machine mobility, address space isolation, and a similar way redundant switches bring resilience to the multitenancy. However, to handle VM migration and network. Therefor in 802.1ad and 802.1ah resilience is address resolution for these migrated VMs, a dSwitch accomplished by network hardware redundancy. must, upon detection of a newly-hosted VM, send a location update to the DPS. This allows the DPS to Like NVGRE, VXLAN is a tunneling technique. update its address resolution information. VXLAN endpoints, the VTEP, are located within the 28 hypervisors of each server hosting VMs. VXLAN does not define resilience techniques. Hypervisor failure is managed through the use of redundant servers and the migration of VMs to those backup servers. However the session continuity might be interrupted. Session continuity could be maintained if a faulty hypervisor is detected prior to total failure thereby allowing a backup server to join this VNI permitting live VM migration. This solution is not defined in  and is only a possible solution to enhance VXLAN resilience. router redundancy protocol . As VSITE is a solution with a centralized controller (the Directory server) it might be necessary to replicate this controller. BlueShield has a centralized controller, the directory server, whose role is to resolve addresses and to enforce isolations rules. If this device fails then there will be no communication between VMs as they will be unable to obtain each other’s MAC address. To prevent such a situation, the directory server is replicated over several devices. Also, to improve reliability, a BlueShield agent will send its queries to several directory server devices in order to have at least one answer. If a BlueShield agent, being located in each VM, fails then logically the VM itself will have failed. The same is true for the vSwitches and ebtables, as they are located in each server. BlueShield imposes a Layer 2 network but nothing else so it is possible to use ECMP in this Layer 2 network. the maximum number of clients, thus scalability is an important criteria. In order to benefit from a high-bandwidth resilient multipath fabric using Ethernet switches, NetLord relies on SPAIN . Like the other solutions using a centralized controller, it might be necessary to have redundant configuration repositories, not only for availability but also for improvement in performance. The NetLord’s agents are all located inside the hypervisors Since Diverter uses a Layer 2 core network it is pos- of each physical server, so for NLA redundancy we must sible to use ECMP in order to increase resiliency of have server redundancy. the path. Additionally, a Farm’s virtual gateway is VNT, being a distributed solution, has no need for distributed in all the VMs of this Farm so there is no redundant centralized controller. Additionally, VNT risk that the failure of the virtual gateway will block all uses ECMP for multiple paths, so even if an RBridge communications with this Farm’s VMs. However if a fails, the traffic is sent to another RBridge. Each VNI server is down then the Farm must be replicated on an(a.k.a tenant) can have its own multicast distribution other server. As there is no live migration in Diverter, tree and it is possible to configure a backup tree if all the connections must be re-established. needed. PortLand’s core network is based on a multi-rooted The Clos topology, used by VL2, provides a simple fat-tree topology, which increases link capacity at the and resilient topology. Routing in such a topology is tree summit, and uses the ECMP protocol. Additiondone by taking a random path up to a random interally there is redundancy at the aggregation and core mediate switch and then taking another random path level switches. However the most important element down to a destination ToR switch. However VL2 has a in the PortLand solution is the fabric manager. If the centralized controller (the directory server) which must fabric manager fails then address resolution is no longer be replicated. Otherwise, in case of failure, address respossible. For this reason the fabric manager should be olution would be impossible. replicated. These backups however don’t need to be The use of tunneling protocols in DOVE provides exact replicas, since the fabric manager does not mainmultipath capabilities and routing resiliency. The key tain a hard state. component of DOVE, the DOVE Policy Service, must Sec2 uses multiple FEs per site in order to increase be resilient to ensure high availability. It maintains the reliability. Also the Central Controller (CC) can have information of the network in order to resolve dSwitch a backup if needed. Additionally the CC can become policy requests. DPS should additionally be replicated a Distributed Controller (DC). As client networks are and have multiple backups. managed independently from each other, it is possible to have several controllers, each managing different client networks. However this solution increases the 7.5 Scalability comparison administration complexity and may result in having a In virtualized data center and virtualized environment backup for each small controller. in general, the goal is to share the infrastructure among As mentioned in , the separation of the Endpoint Identifiers (EIDs) and Routing Locators (RLOCs) in LISP, allows for a better scalability through a greater aggregation of RLOCs. However new limits are imposed, notably one RLOC address having 232 = 4294967296 possible EIDs in IPv4 and 2128 ≈ 3, 4∗1038 in IPv6. These limits are also applicable for the number of possible RLOCs, so we can see that the address space is scalable. However the MS and MR are hardware components with memory and CPU limitations. MS and MR must store mapping information between RLOCs and EIDs, but seeing as there are, for one RLOC with an IPv4 addressing, approximately 4 bilThere is no specific technique for resilience defined in lion EID addresses, we can conclude that the scalabilVSITE. However a server can be multi-homed to mul- ity issue lies within the MS and MR components. For tiple top-of-rack switches. In this case there must be a example in  the maximum number of NAT transmaster switch to handle the locIP. This master or slave lations stored is 2147483647 which is only half of the configuration of the switches is done with the virtual number of EIDs in IPv4. This limitation is based on 29 a theoretical maximum within the Cisco IOS XE opWith its virtual IP addressing scheme, Diverter manerating system, and not upon any physical hardware ages up to 16 million VMs system-wide. However the limitation. solution for IP addressing is more restrictive with regards to the number of tenants. With an IP of the The use of NVGRE endpoints allows the representa- type 10.F.S.H we have 255 distinct farms, with 255 tion of multiple Customer Address (CA) by only one subnets in each farm. If we consider that each client Provider Address (PA). This way the core network uses one Sub-network then we have a maximum of routers have fewer addresses to store and manage. Also 255 × 255 = 65025 clients in total. This limit can be the sizes of MAC address tables at the Top of Rack modified, as stated in  since this address scheme is by switches are reduced. It is also possible to increase scal- default and may be modified prior to network deployability by implementing proxy ARP at each NVGRE ment. With this in mind, we are faced with another endpoints to prevent most broadcast traffic and convert scalability issue; to determine in advance how many the rest to multicast traffic which reduces the load on farms, subnets, and hosts would be required. On the the control plane. To prevent most broadcast traffic, other hand, this addressing technique allows the core NVGRE endpoints must be placed within the hyper- switches to only see one MAC address per server, which visor. The VSID field, in the NVGRE header, is 24 reduces the size of the MAC forwarding table. bits long so there are 224 = 16777216 virtual Layer 2 As discussed in Section 126.96.36.199, PortLand is a solunetworks. tion with a centralized control plane. The fabric manIn STT the core network only knows the IP addresses ager, a single machine, must manage all ARP traffic of each virtual switch, one virtual switch per server at for the whole network, rendering this architecture unmost. In the worst case scenario, there is only one scalable in a large data center. For example in a data VM per server so there is the same number of virtual center with 27.648 end hosts (not tenants), each makswitches as VMs and since it uses an IP based address ing 25 ARP requests per second, the fabric manager scheme, it has a similar scalability as IP including the would need approximatively 15 CPU cores working full ability to aggregate. However, usually a server hosts time just to handle the ARP requests. Additionally more than one VM so we have greater scalability. Ad- there is no notion of tenant isolation in the solution. ditionally, with a Context ID of 64 bits it is possible to In order to provide isolation, rules could be enforced have 264 ≈ 1.8∗1019 IDs. Consequently, scalability lim- by the fabric manager. For example, the fabric manitations reside in the virtual switches, which are unable ager will only respond to ARP queries when allowed by to manage so many IDs. Another limit is the number policies. However, doing so increases overhead on the of VMs per server. This last limit can be mitigated fabric manager, thereby further decreasing the number if we change the STT endpoint location. By using a of end hosts that can be managed. A possible solution dedicated device in front of several servers, the number would be to have additional fabric manager for fewer of VMs managed by this device will be higher than if end hosts thereby reducing the load on the fabric manthe STT endpoint was in the server itself. However, ager, but in  this solution is preceded by "it should be possible", so additional fabric manager configurathis implies additional network hardware. tion may be necessary. Both 802.1ad and 802.1ah standards are evolutions Like PortLand, SEC2 is a solution using a centralized of the 802.1Q standard with the VLAN solution. As control plane. The Centralized Controller is the key elsuch, they both increase the limit of VLAN from 4096 ement for addresses resolution, rules and isolation ento 212 ∗ 212 = 16777216 for 802.1ad and to 212 ∗ 220 = forcement. The results of the CC can be extrapolated 4294967296 for 802.1ah, and with the optional fields from PortLand fabric manager results. In fact those to 212 ∗ 220 ∗ 212 ∗ 212 ≈ 7 ∗ 1016 . The issue here lies results might be "worse" seeing that the CC has more with the switches, which have to manage this number actions to do when processing an ARP request than of VLANs. A switch is incapable of managing so many the fabric manager. In order to reduce the load, the VLANs, so in order to reduce this number, join and SEC2 CC can become a Distributed Controller with leave messages ensure that the switch manages only each device managing some client networks. This way the VLANs needed by the endpoints. we can increase the number of edge domains in the VXLAN uses a VXLAN Network Identifier (VNI) which is 24 bits long so there are 224 = 16777216 VNIs. However VXLAN works at the software level, which impacts overall performance because hardware offload is not possible. Whether or not this is important, draft  attempted to address this question. They observed increased CPU and unstable throughput 5.6 Gb across a 10Gb network. However those results must be taken with a grain of salt,because, as stated in the paper, the tests were realized using only one server when the design and purpose of VXLAN is for a multi-server environment. data center. Another limitation is the number of client networks in each edge domain. As the tenant isolation in edge domains is done thanks to VLANs, the number of 4096 tenants is the limit. However, the number of tenants within the DC is limited by the number of domain edges multiplied by the number of VLANs per edge domain. The limit of edge domains is the maximum number of MAC addresses. As long as there are free MAC addresses we can add edge domains. One Edge domain is associated with all its FEs’ MAC address. 30 BlueShield improves its scalability by suppressing all VMs’ ARP broadcast and converting them to unicast directory server lookups. However the PortLand experience can be used as reference for this solution. We saw that for ≈ 27000 end hosts each sending 25 ARP requests per second, the centralized controller will need approximatively 15 CPU cores working non-stop to manage these requests. In order to overcome this limitation, BlueShield uses redundant directory servers to share the load. Nevertheless, contrary to SEC2, in BlueShield each directory server must have the same information. Even if we increase the number of directory servers in order to alleviate the CPU load, we will have another limitation imposed by the physical memory of the device. Additionally, the directory server must not only save the ARP information but also the rules indicating which VMs can exchange data. As a consequence of this quantity of information the DS must look through, the latency is increased. Using locIP based on the Layer 3 switch virtual IP, VSITE can aggregate multiple VMs under one IP address. The VMs MAC addresses are only known inside the data center edges. This allows for smaller table size in core network routers as they only learn the locIP addresses. However, like the other solutions using a centralized control plane, one scalability limit is given by the capacities of the directory server which must store both IP addresses, the locIP and the real IP, MAC addresses, VLANs, and must resolve address queries. Additionally, VSITE uses VLANs for client isolation and therefore imposes a limit of 4096 virtual networks. Concerning scalability, NetLord uses an IP encoding which gives 24 bits for the Tenant_ID value. With 24 bits it is possible to have 224 = 16777216 simultaneous tenants. The encapsulation scheme prevents Layer 2 switches to see and save all Layer 2 addresses. These Layer 2 switches see the local Layer 2 addresses and the addresses of all the edge switches. The authors of  r estimate that NetLord can support: N = V × R × F ( ) virtual machines. Where V is the number of 2 VMs per physical server, R is the switch radix, and F is the MAC forwarding information base (FIB) size (in entries). In Table 1, they presented results for V = 50. Additionally NetLord use multipath technology based on SPAIN and so achieves a throughput similar to that of machine-to-machine communication. The VNT solution based on TRILL has the same scalability advantages as TRILL. The core RBridges only learn the nicknames of the other RBridges. An edge RBridge aggregate multiple VMs MAC addresses under its nickname. So RBridge forwarding database sizes are reduced compared to classical Ethernet forwarding databases.  states : "... unicast forwarding tables of transit RBridges to be sized with the number of RBridges rather than the total number of end nodes ..." It is also true if VNT is used with MLTP. Additionally, VNT introduces a VNI TAG to separate the virtual networks. This TAG is 24 bits long so it can accommodate 224 = 16777216 virtual networks which should be sufficient for the next few years. The scalability of the VL2 solution is limited by the capacity of the directory server. In order to increase scalability, VL2 uses additional directory servers. These additional directory servers improve the maximum lookup rate. Some experimental results are given in . In those experiments, the goal was to process the most lookup requests possible, while ensuring sub-10ms latency for 99% of the requests. They found that a directory server can manage 17000 lookups/sec and that the lookup rates increase linearly with the increase of servers. In the worst case scenario, chosen in , 100000 servers simultaneously performing 10 lookup requests requires 60 servers in the directory system. We can conclude that the scalability limitation of VL2 comes from its directory system. DOVE’s scalability is achieved thanks to the tunneling protocol it uses. As such the choice of the tunneling protocol is bound by several attributes. Among them are the interoperability and scalability attributes. Those attributes define that the protocol must used genuine headers for delivery and it must adapt to different underlays. However the scalability issue is located in the DPS. As DOVE is a solution with a centralized control plane, we have the same issue of having the centralized controller being overloaded by the amount of policy requests. So the DPS must be scalable but the means to achieve this are not specified in the article. 7.6 Multi data center comparison Multi data center interconnection is interesting since Table 1: NetLord worst-case limits on unique MAC there often may be multiple physical facilities for a given virtual data center. For this reason we will idenaddresses (From ) tify if the proposed solutions have inherent multi data center capabilities. Switch FIB Sizes 16K 32K 64K 128K Radix LISP is adapted for multiple data centers as long 24 108,600 153,600 217,200 307,200 as each data center is a LISP site with at least one 48 217,200 307,200 434,400 614,400 xTR and a RLOC address. In this case then all de72 325,800 460,800 651,2600 921,600 vices in the data center have EID addresses that are 94 425,350 601,600 850,700 1,203,200 associated to the RLOC address of the xTR of the 120 543,000 768,000 1,086,000 1,536,000 data center. In fact,  examines the best possible 144 651,600 921,600 1,303,200 1,843,200 deployment of LISP in a data center and section 5 discusses data center interconnection over a wan network. 31 However, if some data centers are not LISP-enabled then we need to refer to RFC 6832 , which describes how an interconnection between a LISP site and a non LISP site is possible and implemented. This standard introduced three such mechanisms. One uses a new network element, a LISP Proxy Ingress Tunnel Router (Proxy-ITR), installed at non LISP site. Another mechanism adds another layer of Network Address Translation (NAT) at xTR. And the last also uses a new network element, a Proxy Egress Tunnel Router (Proxy-ETR). to have a Layer 2 fat tree topology between the data centers. If the interconnection of each data center core switches via a Layer 2 fat-tree topology is achievable then we could achieve a large-scale PortLand network spanning multiple data centers. This being said, multi data center connectivity is not discussed in the solution brief. By design, SEC2 is already multi-domain. We have a core domain which interconnects several edge domains. We could see the edge domains as data centers and the core domain as a Layer 2 interconnection between NVGRE can be used like a site-to-site VPN. To the data centers. Additionally we need a centralized do so each site needs a VPN gateway which supports controller reachable by all forwarding elements (FE) NVGRE. These gateways will then establish a tunnel in all data centers. If we use a distributed controller between them and encapsulate and respectively decap- then each member device must also be reachable by the FEs. The only issue is scalability. As tenant isolation sulate the sent and received packets. in edge domains is done via VLANs, it means that in As mentioned in , "STT deployments are almost each data center we will have at most 4096 VLANs entirely limited at present to intra-data center envi- which is insufficient for virtualized data centers. ronments". This is explained by the fact that STT The BlueShield solution is based upon preventing uses a TCP-like header that has the same fields as a address resolution by blocking ARP queries and conTCP header but not the same functionalities. As such, verting them to unicast directory server lookups via the middle boxes which do not have STT knowledge the vSwitch. There is no notion of multi data center in will drop the packets. That is why, for now, STT is the paper, however we can imagine a simple solution only used in environments where the same administrawith a directory server replicated in each data center tive entity can manage all the middle boxes to process which manages all the rules for inter data center comSTT packets. So for now, even if theoretically STT munication throughout the network. can be used like a site-to-site VPN, it is not practically feasible. By design, VSITE interconnects multiple client sites to a data center. So in the same manner we can also 802.1ad and 802.1ah can interconnect data centers if the network between them is a Layer 2 network. Most interconnect data centers. In order to manage this, it is of the time however, it is a Layer 3 network thus the necessary to increase the directory server capabilities frames need to be encapsulated in IP. All the switches to match the increase in information it stores. This in the data center and in the Layer 2 network must directory server will have to store the information conrespect the 802.1ad or 802.1ah standards. As both are cerning all VMs in the network. Also each data center standards, all the recent switches from manufacturers will need at least one cloud data center in order to implement OTV-like protocol, which exchanges MAC support them. reachability information with other cloud data centers VXLAN is designed mostly for intra data center and the directory server. communication, however it is possible to use it like a NetLord does not address the multi data center issite-to-site VPN with VXLAN gateways at each site. sue. However it is possible to interconnect multiple Additionally,  proposes the use of Ethernet VPN data centers and as a result implement a larger net(E-VPN) technology to interconnect VXLAN sites. It work. All these data centers will share the same concould also be used to interconnect NVGRE sites. Howtrol plane, meaning that all control messages will travel ever this solution imposes the use of IP/MPLS netacross the public interconnection to reach all the data works between the sites. centers. This implies that the configuration repository Diverter does not specify any inter data center com- will have to store the information for all data centers, munication techniques. Nevertheless, each farm hosts which presents a potential scalability problem. Addimultiple subnets, each with multiple hosts, and we can tionally, this interconnection will have to be done with extrapolate in saying that a farm could represent a data a Layer 2 message transporting tunnel. center. This way we see that to have multiple data cenThe TRILL protocol is multi-data-center-ready, and ter interconnected with Diverter the only requirement thereby also is VNT. However, to manage this multi would be to have a Layer 2 connection between those data center network, the solution used by TRILL is data centers. However, doing so would result in poor to have one big network with one control plane shared scalability. Additionally all the control traffic would among the data centers. This is not scalable seeing have to reach all the VNETs from all data center which that there are only 16 bits for a nickname, 65536 nickmight be a costly use of the interconnection links. names in total, and that they all must be unique. This PortLand is based on a fat tree topology which is also means that the interconnection of those TRILL a data center topology. So in order to use PortLand data centers must be done using site-to-site tunnels. for inter data center communication it is mandatory However, when using the MLTP/VNT solution, the 32 merging issue does not exist anymore as each data center control plane remains independent. Additionally, MLTP introduce a new nickname management which increase the number of available nicknames to more than one billion. Nevertheless, even when using MLTP, the interconnection of MLTP data centers must be done using site-to-site tunnels. The multi data center issue is not discussed in VL2. It might however be possible to interconnect multiple data centers. The directory system could be externalized in order to manage the whole network, which could span multiple data centers. Like VL2, DOVE does not address the multi data centers issue. Nevertheless it might be possible to have the DOVE solution span over multiple data centers. However, this means that the DPS will have to manage the request of even more dSwitches from all the data centers. This way the control plane is shared among all the data centers and the DPS redundancy is the new scalability limit. 8 Discussion Tunneling solutions based on a centralized controller need to tackle the scalability issue. Current solutions with a centralized control plane need a centralized controller with substantial processing power or such devices are costly. A solution could be to use multiple devices aggregated to form a centralized controller in order to share the load. However, those devices have to be synchronized. Another solution could be to have smaller interconnected data center. However the interconnection between data centers is not really addressed. Even if some architectures are multi-site by design, those have scalability issues inside each site which prevents the site from being a data center. For those architectures, the solution might be found in using another tunneling protocol enabling a better scalability within a site. The next possible extension of cloud computing is the hybrid cloud. Gartner  expects that hybrid cloud will be adopted by almost half of the largest companies by the end of 2017. In this type of cloud, tenants’ traffic will be indistinctly crossing the public network between the data center network and a tenant’s private network. And as in public cloud, tenants would want their traffic to be isolated from other tenants and other entities in general. The presented solutions improve tenant data security thanks to traffic isolation achieved by respecting rules and forwarding tables. However, isolation is only a part of security. Security is an area to improve as isolation is not sufficient to guaranty integrity and prevent the theft of the data. For example, if a corrupt or faulty component does not respect these rules then tenants’ traffic isolation is compromised. Another case could be that a malicious user or even a data center employee might illegally access a central component. It could allow them to arbitrarily implement any rules they desire and thereby override isolation rules, even if network components correctly adhere to them. An attacker could also realize a man-in-the-middle attack or intercept the traffic, or as data centers are now more and more virtualized, with VMs migrating across these data centers to improve performance, or in case of necessity, there are more and more tenants and their data is passing through more and more devices increasing the risk for the data. Additionally, now that hybrid cloud is growing, work must be done in order to isolate traffic from end-to-end between the tenant’s private network and the cloud’s infrastructure. Any solution must also take into account that the security requirements of one tenant may be considerably different to that of another. This demonstrates a need to manage several security mechanisms and policies, which increases the complexity of the network. In addition, there are potential conflicts between intrusion detection systems policies, belonging to service or infrastructure providers, and firewalls, which need to be resolved . Another topic to tackle is the fact that Layer 2 solutions mostly use Spanning Tree (STP), Rapid Spanning Tree (RSTP), or Multiple Spanning Tree (MSTP) to prevent loops, thus rendering unusable a number of links and reducing the overall performance of the data center. To prevent that, a solution could be to use level 2 multipath technology. TRILL possesses such functionality and other Layer 2 solutions could use Shortest Path Bridging (SPB) specified in the IEEE 802.1aq standard. Another area of improvement for Layer 2 solution is CPU offloading. Some Layer 2 solutions add a new header which is not yet recognized by Network Interface Cards (NICs) thus the processing of the packet is done by the CPU which consumes additional resources and decreases overall performance. A practical solution could be to program the offloading of these new headers in NICs or to distribute traffic across multiple CPUs. 9 Conclusion Data centers are being more frequently used. Especially cloud data centers where workloads, representing 39% of total data center workloads, will continue to grow, up to 63% of the total data center workloads by 2017. The cloud data center advantage is that it hosts multiple tenants to increase infrastructure efficiency and reduce costs. However some issues arose like tenants’ traffic isolation. In this paper, we surveyed fifteen solutions that provide tenant traffic isolation in a cloud network. We first presented them and then compared their complexity, the overhead they induce, their abilities to manage VMs migration, their resilience, their scalability, and their multi data center capabilities. Each solution provides tenant traffic isolation by using varying approaches, however these solutions are not all multidata-center-ready, and those that are have potential 33 issues with scalability. Nevertheless, VNT solution based on TRILL derives multi data center capability from the work already done on trill, implementing control plane isolation in each data center and an interconnection network control plane, thereby increasing scalability [104, 105, 106, 72]. Finally we identified some research areas which are not yet thoroughly discussed in these papers, and are areas for possible future research. Tenant traffic is not safe enough by just isolating it. It may be necessary to implement other security mechanisms in order to provide better security. Data centers are increasingly being virtualized, with VMs migrating across these data centers to improve performance or in case of necessity. However the interconnection between data centers is not really addressed. When a multi data center technique is presented, there is a trade off in the scalability of the solution. Additionally, now that hybrid clouds are growing, work must be done in order to isolate traffic from end to end between a tenant’s private network and the cloud. References  Saurabh Barjatiya and Prasad Saripalli. Blueshield: A layer 2 appliance for enhanced isolation and security hardening among multi-tenant cloud workloads. In Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing, UCC ’12, pages 195–198, Washington, DC, USA, 2012. IEEE Computer Society. http://dx.doi.org/10.1109/UCC.2012.21.  Li Li and Thomas Woo. Vsite: A scalable and secure architecture for seamless l2 enterprise extension in the cloud. In Secure Network Protocols (NPSec), 2010 6th IEEE Workshop on, pages 31–36. IEEE, 2010.  Jayaram Mudigonda, Praveen Yalagandula, Jeff Mogul, Bryan Stiekes, and Yanick Pouffary. Netlord: A scalable multi-tenant network architecture for virtualized datacenters. In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM ’11, pages 62–73, New York, NY, USA, 2011. ACM. http://doi.acm.org/10.1145/2018436.2018444.  Ahmed Amamou, Kamel Haddadou, and Guy Pujolle. A trillbased multi-tenant data center network. Computer Networks, 68(0):35 – 53, 2014. Communications and Networking in the Cloud http://www.sciencedirect.com/science/article/pii/ S1389128614000851.  Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. Vl2: A scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM ’09, pages 51–62, New York, NY, USA, 2009. ACM. http://doi.acm.org/10.1145/1592568.1592576.  Liane Lewin-Eytan, Katherine Barabash, Rami Cohen, Vinit Jain, and Anna Levin. Designing modular overlay solutions for network virtualization. Technical report, IBM, 2011.  Cisco global cloud index: Forecast and methodology, 20132018. Technical report, Cisco Systems, Inc, 2014. http://www.cisco.com/c/en/us/solutions/collateral/service -provider/global-cloud-index-gci/ Cloud_Index_White_Paper.html.  R. Cohen, K. Barabash, V. Jain, R. Recio, and B. Rochwerger. Dove: Distributed overlay virtual network architecture, 2012.  D. Farinacci, V. Fuller, D. Meyer, and D. Lewis. The locator/id separation protocol (lisp). RFC 6830, January 2013.  Rouven Krebs, Christof Momm, and Samuel Kounev. Architectural concerns in multi-tenant saas applications. In CLOSER, pages 426–431, 2012.  Murari Sridharan, Yu-Shun Wang, Albert Greenberg, Pankaj Garg, Narasimhan Venkataramiah, Kenneth Duda, Ilango Ganga, Geng Lin, Mark Pearson, Patricia Thaler, and Chait Tumuluri. Nvgre: Network virtualization using generic routing encapsulation. Work in progress, draft-sridharanvirtualization-nvgre-04, February 2014.  Bruce Davie and Jesse Gross. A stateless transport tunneling protocol for network virtualization (stt). Work in progress, draft-davie-stt-06, April 2014.  Institute of Electrical and Electronics Engineers. Ieee 802.1ad2005. 802.1ad - Virtual Bridged Local Area Networks, 2005.  Institute of Electrical and Electronics Engineers. Ieee 802.1ah2008. 802.1ah - Provider Backbone Bridges, 2008.  Mallik Mahalingam, Dinesh G. Dutt, Kenneth Duda, Puneet Agarwal, Lawrence Kreeger, T. Sridhar, Mike Bursell, and Chris Wright. Vxlan: A framework for overlaying virtualized layer 2 networks over layer 3 networks. Work in progress, draft-mahalingam-dutt-dcops-vxlan-09, April 2014.  Aled Edwards, Anna Fischer, and Antonio Lain. Diverter: A new approach to networking within virtualized infrastructures. In Proceedings of the 1st ACM Workshop on Research on Enterprise Networking, WREN ’09, pages 103–110, New York, NY, USA, 2009. ACM. http://doi.acm.org/10.1145/1592681.1592698.  Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya, and Amin Vahdat. Portland: A scalable fault-tolerant layer 2 data center network fabric. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM ’09, pages 39–50, New York, NY, USA, 2009. ACM. http://doi.acm.org/10.1145/1592568.1592575.  Fang Hao, T. V. Lakshman, Sarit Mukherjee, and Haoyu Song. Secure cloud computing with a virtualized network infrastructure. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pages 16–16, Berkeley, CA, USA, 2010. USENIX Association. http://dl.acm.org/citation.cfm?id=1863103.1863119. 34  Cisco virtualized multi-tenant data center, version 2.0 compact pod design guide. Technical report, Cisco Systems, Inc, 2010. http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/ Data_Center/VMDC/2-0/design_guide/ vmdcDesignGuideCompactPoD20.pdf.  Cisco virtualized multi-tenant data center, version 2.2 design guide. Technical report, Cisco Systems, Inc, 2012. http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/ Data_Center/VMDC/2-2/design_guide/vmdcDesign22.pdf.  Securing multi-tenancy and cloud computing. Technical report, Juniper Networks, Inc, 2012. https://www.juniper.net/us/en/local/pdf/ whitepapers/2000381-en.pdf.  Steve Bobrowski. The force.com multitenant architecture. Technical report, salesforce.com, inc, 2013. http://s3.amazonaws.com/dfc-wiki/en/images/8/8b/ Forcedotcom-multitenant-architecture-wp-2012-12.pdf.  Virtual network overview. https://msdn.microsoft.com/en-us/library/azure/jj156007.aspx.  N.M. Mosharaf Kabir Chowdhury and Raouf Boutaba. A survey of network virtualization. Computer Networks, 54(5):862 – 876, 2010.  A. Fischer, J.F. Botero, M. Till Beck, H. de Meer, and X. Hesselbach. Virtual network embedding: A survey. Communications Surveys Tutorials, IEEE, 15(4):1888–1906, Fourth 2013.  Tunneling - cisco. http://www.cisco.com/c/en/us/products/ ios-nx-os-software/tunneling/index.html.  What is a tunneling protocol? http://usa.kaspersky.com/internet-security-center/ definitions/tunneling-protocol.  Vpn tunneling protocols. https://technet.microsoft.com/en-us/ library/dd469817%28v=ws.10%29.aspx.  E. Rosen, A. Viswanathan, and R. Callon. Multiprotocol label switching architecture. RFC 3031, January 2001.  Li heng, Yang dan, and Zhang xiaohong. Survey on multitenant data architecture for saas. IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 6, No 3, November 2012, 2012.  Stefan Aulbach, Torsten Grust, Dean Jacobs, Alfons Kemper, and Jan Rittinger. Multi-tenant databases for software as a service: Schema-mapping techniques. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, pages 1195–1206, New York, NY, USA, 2008. ACM.  Vivek Narasayya, Sudipto Das, Manoj Syamala, Badrish Chandramouli, and Surajit Chaudhuri. Sqlvm: Performance isolation in multi-tenant relational database-as-a-service. In 6th Biennial Conference on Innovative Data Systems Research (CIDR ’13), 2013.  Ying Hua Zhou, Qi Rong Wang, Zhi Hu Wang, and Ning Wang. Db2mmt: A massive multi-tenant database platform for cloud computing. In e-Business Engineering (ICEBE), 2011 IEEE 8th International Conference on, pages 335–340, Oct 2011.  Xuequan Zhou, Dechen Zhan, Lanshun Nie, Fanchao Meng, and Xiaofei Xu. Suitable database development framework for business component migration in saas multi-tenant model. In Service Sciences (ICSS), 2013 International Conference on, pages 90–95, April 2013.  Wang Xue, Li Qingzhong, and Kong Lanju. Multiple sparse tables based on pivot table for multi-tenant data storage in saas. In Information and Automation (ICIA), 2011 IEEE International Conference on, pages 634–637, June 2011.  Bob Braden, Lixia Zhang, Steve Berson, Shai Herzog, and Sugih Jamin. Resource reservation protocol (rsvp) – version 1 functional specification. RFC 2205, September 1997.  Loa Andersson, Ross Callon, Ram Dantu, Paul Doolan, Nancy Feldman, Andre Fredette, Eric Gray, Juha Heinanen, Bilel Jamoussi, Timothy E. Kilty, and Andrew G. Malis. Constraintbased lsp setup using ldp. RFC 3212, January 2002.  Kathleen Nichols, Steven Blake, Fred Baker, and David L. Black. Definition of the differentiated services field (ds field) in the ipv4 and ipv6 headers. RFC 2474, December 1998.  L. Berger. Generalized multi-protocol label switching (gmpls) signaling functional description. RFC 3471, January 2003.  E. Mannie. Generalized multi-protocol label switching (gmpls) architecture. RFC 3945, October 2004.  E. Rosen and Y. Rekhter. Bgp/mpls ip virtual private networks (vpns). RFC 4364, February 2006.  Jon Brodkin. Vmware users pack a dozen vms on each server, despite memory constraints. http://www.networkworld.com/article/2197837/virtualization/ vmware-users-pack-a-dozen-vms-on-each-server-despite-memory-constraints.html.  Lori MacVittie. Virtual machine density as the new measure of it efficiency. https://devcentral.f5.com/articles/virtual-machinedensity-as-the-new-measure-of-it-efficiency.  Ahmed Amamou. Network isolation in a virtualized datacenter. PhD thesis, University Pierre and Marie Curie - Paris 6 EDITE of Paris, 2013. French thesis, Isolation reseau dans un datacenter virtualise.  Determine true total cost of ownership. http://www.vmware.com/why-choose-vmware/total-cost/ virtual-machine-density.html.  Jinho Hwang, Sai Zeng, F.Y. Wu, and T. Wood. A componentbased performance comparison of four hypervisors. In Integrated Network Management (IM 2013), 2013 IFIP/IEEE International Symposium on, pages 269–276, May 2013.  Hasan Fayyad-Kazan, Luc Perneel, and Martin Timmerman. Benchmarking the performance of microsoft hyper-v server, vmware esxi and xen hypervisors. Journal of Emerging Trends in Computing and Information Sciences, 4(12), 2013.  Todd Deshane, Zachary Shepherd, J Matthews, Muli BenYehuda, Amit Shah, and Balaji Rao. Quantitative comparison of xen and kvm. Xen Summit, Boston, MA, USA, pages 1–2, 2008.  Wei Jing, Nan Guan, and Wang Yi. Performance isolation for real-time systems with xen hypervisor on multi-cores. In Embedded and Real-Time Computing Systems and Applications (RTCSA), 2014 IEEE 20th International Conference on, pages 1–7, Aug 2014.  Stan Hanks, Tony Li, Dino Farinacci, and Paul Traina. Generic routing encapsulation (gre). RFC 1701, October 1994.  Dino Farinacci, Tony Li, Stan Hanks, David Meyer, and Paul Traina. Generic routing encapsulation (gre). RFC 2784, March 2000.  K. Hamzeh, G. Pall, W. Verthein, J. Taarud, W. Little, and G. Zorn. Point-to-point tunneling protocol (pptp). RFC 2637, July 1999.  Rich Miller. A look inside amazon’s data centers. http://www.datacenterknowledge.com/archives/2011/06/ 09/a-look-inside-amazons-data-centers/.  Rich Miller. Who has the most web servers? http://www.datacenterknowledge.com/archives/2009/05/ 14/whos-got-the-most-web-servers/.  Kireeti Kompella. New take on sdn: Does mpls make sense in cloud data centers? http://www.sdncentral.com/use-cases/does-mpls -make-sense-in-cloud-data-centers/2012/12/.  Jayaram Mudigonda, Praveen Yalagandula, Mohammad AlFares, and Jeffrey C. Mogul. Spain: Cots data-center ethernet for multipathing over arbitrary topologies. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI’10, pages 18–18, Berkeley, CA, USA, 2010. USENIX Association. http://dl.acm.org/citation.cfm?id=1855711.1855729.  Charles Clos. A study of non-blocking switching networks. Bell System Technical Journal, The, 32(2):406–424, March 1953.  Ronald van der Pol. Ieee 802.1ah basics (provider backbone bridges), March 2011.  Marco Foschiano and Sanjib HomChaudhuri. Cisco systems’ private vlans: Scalable security in a multi-client environment. RFC 5517, February 2010.  W. Townsley, A. Valencia, A. Rubens, G. Pall, G. Zorn, and B. Palter. Layer two tunneling protocol "l2tp". RFC 2661, August 1999.  Danny McPherson and Barry Dykes. Vlan aggregation for efficient ip address allocation. RFC 3069, February 2001.  J. Lau, M. Townsley, and I. Goyret. Layer two tunneling protocol - version 3 (l2tpv3). RFC 3931, March 2005.  S. Bryant and P. Pate. Pseudo wire emulation edge-to-edge (pwe3) architecture. RFC 3985, March 2005.  Charles E. Leiserson. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Trans. Comput., 34(10):892–901, October 1985. http://dl.acm.org/citation.cfm?id=4492.4495.  Radia Perlman. Rbridges: Transparent routing. In Proceedings of the IEEE INFOCOMM 2004, INFOCOMM ’04, 2004.  Institute of Electrical and Electronics Engineers. Ieee 802.1q2005. 802.1q - Virtual Bridged Local Area Networks, 2005.  Institute of Electrical and Electronics Engineers. Ieee 802.1d1990. 1990.  Radia Perlman, Donald E. Eastlake 3rd, Dinesh G. Dutt, Silvano Gai, and Anoop Ghanwani. Routing bridges (rbridges): Base protocol specification. RFC 6325, July 2011.  Ieee.org, 802.1ad - provider bridges. http://www.ieee802.org/1/pages/802.1ad.html.  David R. Oran. Osi is-is intra-domain routing protocol. RFC 1142, February 1990. 35  Information technology – Telecommunications and information exchange between systems – Intermediate system to Intermediate system intra-domain routeing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode Network Service. (ISO 8475), ISO/IEC 10589, 1992.  Alan Ford, Costin Raiciu, Mark Handley, and Olivier Bonaventure. Tcp extensions for multipath operation with multiple addresses. RFC 6824, January 2013.  Valentin Del Piccolo, Ahmed Amamou, William Dauchy, and Kamel Haddadou. Multi-tenant isolation in a trill based multicampus network. In Cloud Networking (CloudNet), 2015 IEEE 4th International Conference on, pages 51–57, Oct 2015.  Lisp overview... http://lisp.cisco.com/lisp_over.html.  Ip addressing: Nat configuration guide, cisco ios xe release 3s. http://www.cisco.com/c/en/us/td/docs/ios-xml/ ios/ipaddr_nat/configuration/xe-3s/nat-xe-3s-book.pdf.  Russell Housley and Scott Hollenbeck. Etherip: Tunneling ethernet frames in ip datagrams. Network Working Group, Request for Comments, 3378, 2002.  Vic Liu, Bob Mandeville, Brooks Hickman, Weiguo Hao, and Zu Qiang. Problem statement for vxlan performance test. work in progress, draft-liu-nvo3-ps-vxlan-perfomance-00, July 2014.  Cisco. Cisco Active Network Abstraction Reference Guide, 3.7, June 2010. Part 2 - Technology Support and Information Model Objects : Virtual Routing and Forwarding.  Victor Moreno, Fabio Maino, Darrel Lewis, Michael Smith, and Satyam Sinha. Lisp deployment considerations in data center networks. work in progress, draft-moreno-lispdatacenter-deployment-00, February 2014.  S Kent and K Seo. Security architecture for the internet protocol. RFC 4301, December 2005.  Alan O. Freier, Philip Karlton, and Paul C. Kocher. The secure sockets layer (ssl) protocol version 3.0. RFC 6101, August 2011.  Openstack security guide. Technical report, OpenStack Foundation, 2014. http://docs.openstack.org/security-guide/ security-guide.pdf.  Openflow. https://www.opennetworking.org/sdn-resources/ onf-specifications/openflow.  Stephen Nadas. Virtual router redundancy protocol (vrrp) version 3 for ipv4 and ipv6. RFC 5798, March 2010.  Darrel Lewis, David Meyer, Dino Farinacci, and Vince Fuller. Interworking between locator/id separation protocol (lisp) and non-lisp sites. RFC 6832, January 2013.  Sami Boutros, Ali Sajassi, Samer Salam, Dennis Cai, Samir Thoria, Tapraj Singh, John Drake, and Jeff Tantsura. Vxlan dci using evpn. work in progress, draft-boutros-l2vpn-vxlanevpn-04, July 2014.  Gartner says nearly half of large enterprises will have hybrid cloud deployments by the end of 2017. https://www.gartner.com/newsroom/id/2599315.  E. Al-Shaer, H. Hamed, R. Boutaba, and M. Hasan. Conflict classification and analysis of distributed firewall policies. IEEE Journal on Selected Areas in Communications, 23(10):2069–2084, 2005.  Amazon virtual private cloud. http://aws.amazon.com/vpc/.  Dan Harkins and Dave Carrel. The internet key exchange (ike). RFC 2409, November 1998.  Yakov Rekhter, Tony Li, and Susan Hares. A border gateway protocol 4 (bgp-4). RFC 4271, January 2006.  Geoffrey Huang, Stephane Beaulieu, and Dany Rochefort. A traffic-based method of detecting dead internet key exchange (ike) peers. RFC 3706, February 2004.  Pekka Savola. Mtu and fragmentation issues with in-thenetwork tunneling. RFC 4459, April 2006.  Defense Advanced Research Projects Agency Information Processing Techniques Office. Internet protocol - darpa internet program - protocol specification. RFC 4459, September 1981.  V. Fuller and D. Farinacci. Locator/id separation protocol (lisp) map-server interface. RFC 6833, January 2013.  Fred Baker, Eliot Lear, and Ralph Droms. Procedures for renumbering an ipv6 network without a flag day. RFC 6325, September 2005.  Charles E. Perkins. Ip mobility support for ipv4, revised. RFC 5944, November 2010.  Charles E. Perkins, David B. Johnson, and Jari Arkko. Mobility support in ipv6. RFC 6275, July 2011.  Jari Arkko, Christian Vogt, and Wassim Haddad. Enhanced route optimization for mobile ipv6. RFC 4866, May 2007.  Dino Farinacci, Darrel Lewis, David Meyer, and Chris White. Lisp mobile node. work in progress, draft-meyer-lisp-mn-10, January 2014.  Bhumip Khasnabish, Bin Liu, Baohua Lei, and Feng Wang. Mobility and interconnection of virtual machines and virtual network elements. work in progress, draft-khasnabish-vmmiproblems-03, December 2012.  Murari Sridharan, Yu-Shun Wang, Pankaj Garg, and Praveen Balasubramanian. Nvgre-ext: Network virtualization using generic routing encapsulation extensions. work in progress, draft-sridharan-virtualization-nvgre-ext-02, June 2014.  Technology white paper - trill. Technical report, HUAWEI TECHNOLOGIES CO., LTD., 2013. http://www.huawei.com/ilink/enenterprise/download/HW_259594.  Radia Perlman, Donald Eastlake, Anoop Ghanwani, and Hongjun Zhai. Flexible multilevel trill (transparent interconnection of lots of links). Work in progress, draft-perlman-trillrbridge-multilevel-07, January 2014.  Sam Aldrin, Donald Eastlake, Tissa Senevirathne, Ayan Banerjee, and Santiago Alvarez. Trill data center interconnect. Work in progress, draft-aldrin-trill-data-center-interconnect00, March 2012.  Tissa Senevirathne, Les Ginsberg, Sam Aldrin, and Ayan Banerjee. Default nickname based approach for multilevel trill. Work in progress, draft-tissa-trill-multilevel-02, Mars 2013. Valentin Del Piccolo is a Phd student at the University Pierre et Marie Curie (UPMC) and at GANDI SAS where he works on virtualization and multi-tenant isolation in data centers networks. He received his M.S degree in network and computer science from the University Pierre et Marie Curie in 2013, Paris, France. Ahmed Amamou is a research engineer at GANDI SAS. He received the engineer degree in computer science from the National School of Computer science (Tunisia) in 2009 and the M.S degree in network and computer science from the same school in 2011; and the Ph.D degree in network and computer science from the University Pierre et Marie Curie in 2013, Paris, France. His research interests are Cloud computing and virtualization technologies. He is a member of the IEEE. Kamel Haddadou received the engineering degree in computer science from INI in 2000, the M.S degree in data processing methods for industrial systems from the University of Versailles, and the PhD degree in computer networks from University Pierre et Marie Curie (UPMC), in 2002 and 2007, respectively. In 2001, he was a research assistant at the Advanced 36 Technology Development Centre (CDTA), Algiers, Algeria. He is currently a research fellow at the Gandi SAS, France. Since 2003, he has been involved in several projects funded by the European Commission and the French government (RAVIR, ADANETS, Adminroxy, GITAN, OGRE, ADANETS, MMQoS, SAFARI, and ARCADE). His research interests are focused primarily on Cloud computing and on resource management in wired and wireless networks. He is equally interested in designing new protocols and systems with theoretical concepts, and in providing practical implementations that are deployable in real environments. He has served as the TPC member for many international conferences, including IEEE ICC, GLOBECOM, and reviewer on a regular basis for major international journals and conferences in networking. He is a member of the IEEE. Guy Pujolle received the PhD and "These d’Etat" degrees in computer science from the University of Paris IX and Paris XI in 1975 and 1978, respectively. He is currently a professor at University Pierre et Marie Curie (UPMC - Paris 6), a distinguished invited professor at POSTECH, Korea, and a member of the Institut Universitaire de France. During 1994-2000, he was a professor and the head of the Computer Science Department of Versailles University. He was also the professor and the head of the MASI Laboratory at Pierre et Marie Curie University (1981-1993), professor at ENST (1979-1981), and a member of the scientific staff of INRIA (1974-1979). He is the French representative at the Technical Committee on Networking at IFIP. He is an editor for ACM International Journal of Network Management, Telecommunication Systems, and an editor-in-chief of Annals of Telecommunications. He is a pioneer in high-speed networking having led the development of the first Gbit/s network to be tested in 1980. He has participated in several important patents like DPI or virtual networks. He is the cofounder of QoSMOS (www.qosmos.fr), Ucopia Communications (www.ucopia.com), Ginkgo-Networks (www.ginkgonetworks.com), EtherTrust (www.ethertrust.com), Virtuor (www.VirtuOR.fr), and Green Communications (www.greencommunications.fr). He is a senior member of the IEEE. 37 Table 2: Comparison of the protocols’ complexities Host isolation Diverter Control plane Distributed Network restriction(s) Flat Layer 2 Tunnel configuration and establishment * 38 NetLord VL2 DOVE LISP NVGRE STT Centralized, Configuration repository Centralized, directory system (ds) Centralized, DOVE Policy service Centralized, Directory Name Server (DNS) Distributed Distributed Layer 2 Layer 2 and edge switches supporting IP forwarding Layer 3 and Clos topologyg Layer 3 Layer 3 Layer 3 network and No fragmentation of NVGRE packets Layer 3 network and middle boxes (firewalls) must permit STT packets Implicit Implicit Implicit Implicit Tunnel management and maintenance Yes with forwarding table and rules in VNET Yes, rules in the DS Yes, with a SPAIN agent Yes, mapping in the ds and VBL protocol Multi-protocol No, IP Yes, Layer 2 No, Ethernet No, IP Security mechanism VNET scans the traffic to enforce rules Echelon VMs scan the traffic to enforce rules None VL2 agents enforce the rules Core isolation Control plane Network restriction(s) Tunnel configuration and establishment * PortLand Centralized, Fabric manager for forwarding and addressing Layer 2 multi-rooted fat-tree Implicit BlueShield Centralized, Directory Server (DS) Possible redundancy of DS Encapsulation protocol dependent Encapsulation protocol dependent Encapsulation protocol dependent Implicit Implicit Implicit Yes with mapping in the ITR and ETR None None No, IP Yes, Layer-2 No, Ethernet None None None None SEC2 802.1ad 802.1ah VSITE VNT VXLAN Centralized, Central Controller (CC) No No Centralized, Directory Server No No Layer 2 None None None Layer 2 with TRILL enabled edges switch Layer 3 network, no fragmentation of VXLAN packets and IGMP querier function Implicit Explicit, GVRP Explicit, GVRP Explicit, MPLS VPN on public network and implicit in vstub Implicit Implicit GVRP, join and leave messages by both end stations and Bridges No, Ethernet GVRP, join and leave messages by both end stations and Bridges No, Ethernet Yes, mapping in directory server and hypervisor Yes, temporary forwarding database entry in RBridges Join and leave messages by VTEPs Yes No, Ethernet None None None None Tunnel management and maintenance Yes, soft states Yes, rules in CC Multi-protocol Yes No, IP Security mechanism None FEs enforce CC rules No, Ethernet Hypervisors enforce rules of directory server * Implicit:based on connectionless IP service model. Explicit:tunnel establishing procedure such as control messages exchange or registration procedures. Table 3: Comparison of the protocols’ overhead Host isolation Diverter BlueShield Encapsulation header None but IP addresse restriction None NetLord MAC and IP encapsulation with address rewriting (MAC+IPv4= 304 bits, MAC+IPv6= 464 bits) Messages Multicast ARP messages Directory look-up request Adddress resolution based on Diverter model. SPAIN agent request to the repository Component(s) VNET in each physical host with VNET ARP engine Directory server, vSwitch, ebtables firewall, BlueShield agent, Echelon VM NetLord Agent (NLA), SPAIN agent, Edge switches with IP routing capacities, Configuration repository 39 Host isolation LISP Encapsulation header Outer IP header(IPV4: 160 bits, IPV6: 320 bits) + UDP header(64 bits)+ LISP header (64 bits) = 288(IPv4) or 448(IPv6) bits Messages Component(s) VL2 DOVE IP header with a LA address (160 or 320 bits) NVGRE, STT, or VXLAN headers Messages for registration and mapping. Look-up requests. Messages for directory. IP-based link state routing protocol for LA address assignation Policy requests Messages for registration of rules and topology. VL2 agent, Directory system dSwitches, DOVE Policy Service (DPS) NVGRE Outer Ethernet header (144 bits) + Outer IP header(IPV4: 160 bits, IPV6: 320 bits) + NVGRE header (64 bits) = 368(IPv4) or 528(IPv6) bits Map-Request, Map-Reply Map-Register, Map-Notify Encapsulated Control Message xTR (ITR,ETR,PETR,PITR) STT Outer Ethernet header (144 bits) + Outer IP header(IPV4: 160 bits, IPV6: 320 bits) + TCP-Like header(192 bits) + STT header(144 bits) = 640(IPv4) or 800(IPv6) bits None None NVGRE Endpoints STT Endpoints Core isolation PortLand SEC2 VSITE MAC-in-MAC in Layer 2 network (144 bits) IP encapsulation in Layer 3 network (IPV4: 160 bits, IPV6: 320 bits) Encapsulation header None MAC header (144 bits) Messages Unicast ARP in best case Worst case ARP broadcast to all end hosts Location Discovery Protocol messages Registration messages Unicast ARP Customer messages for CC rules Uses GARP protocol OTV-like protocol messages Directory lookup request TRILL messages IS-IS protocol messages SPF tree generated based on Link State PDU (LSP) messages Component(s) Edge switches must perform MAC to PMAC header rewriting Central Controller, Forwarding Elements, web portal Directory server Cloud data center CEc VSITE agent RBridges Virtual Switch Core isolation 802.1ad 802.1ah Encapsulation header S-TAG (32 bits) + C-TAG (32 bits) = 64 bits B-DA(48 bits) + B-SA(48 bits) + B-TAG(32 bits) + I-TAG(48 bits) = 176 bits +(optional) S-TAG (32 bits) + C-TAG (32 bits) = 240 bits Messages Component(s) Generic Attribute Registration Protocol Devices must abide by the 802.1ad standard Generic Attribute Registration Protocol Devices must abide by the 802.1ah standard VNT Encapsulation with a VNT header (192 bits) VXLAN Outer Ethernet header(144 bits) + Outer IP Header: (IPv4=160 or IPv6=320 bits) + Outer UDP header: (64 bits) + VXLAN header: (64 bits) = 432(IPv4) or 592(IPv6) Join and leave messages VXLAN Tunnel EndPoints (VTEP) Table 4: Comparison of the Host isolation protocols Diverter BlueShield Migration Migration live or offline depending on time out values. Live migration. Resilience ECMP for multipath. Virtual gateway distributed among all the VM of the Sub-network. Multiple replicas of the directory server. Possibility to use ECMP. Relies on SPAIN for multipath. Configuration Repository might be replicated. Scalability 16 millions VMs system wide. However number of client depend on the division of the IP address. The division must be done before starting the network, no modification after. Centralized controller. CPU load lessen by replicating directory server but memory is limited. 16777216qTenant_IDs (24 bits) F V × R × ( ) virtual machines 2 V = number of VMs per physical server R = switch radix, F = FIB size in entries. Multi data center Not specified. Possible with a Layer 2 interconnection. Control traffic travel between DC. Creates one big network over multiple DC Not specified. Possibly a directory server replicated in each data center for inter-data centers communication rules. Not specified. But possible with Layer 2 tunnels between data centers and one control plane spanning over all the data centers. 40 Host isolation Host isolation Migration Resilience Scalability Multi data center DOVE Tunneling protocol dependent. Additionally dSwitch must inform the DPS when a new VM is detected by it. Multipath and routing resilience thanks to tunnel protocol. Redundancy of the Dove Policy Server. Number of tenants is tunnel protocol dependent: VXLAN has a 24 bits long VNI ≈ 16000000, NVGRE also has a 24 bits long VSID and STT has a 64 bits long Context ID ≈ 1.8 × 1019 . DPS is the scalability limiting component. Not specified but possible. It will require an important DPS to manage the whole network. NetLord Live migration. Uses NetLord Agent messages (NLA): NLA-HERE, NLA-NOTHERE and NLA-WHERE to signal the VM migration. VM’s IP or MAC address unchanged. VL2 Live migration. Separation of location Addresses (LA) and application-specific addresses (AA). Resilience provided by a Clos topology. Redundancy of the Directory server One directory server can manage up to 17000 lookups/sec. The lookup rates increase linearly with the increase of servers. Not specified but possible. It will require an important directory system to manage the whole network. LISP NVGRE STT IPv4 Mobility (RFC5944), IPv6 Mobility (RFC 6275, RFC 4866). Endpoint is an xTR itself. REDIRECT messages No STT mechanisms Redundancy of xTR, MR and MS. Multipath possible but not included in NVGRE Multipath (ECMP) possible but not included in STT Big number of EIDs and RLOCs possible. One RLOC address associated with multiple EIDs addresses. Issue with the MS and MR maximum information saved. One PA associated with multiple CA. Suppress most of the control plane broadcasts messages and convert some of them in multicast messages. Context ID fields is 64 bits long. Issue with the virtual switch which can not manage this much IDs. Yes even with non LISP data center Yes as a site-to-site VPN. Each site must have a NVGRE gateway Theoretically, yes as a site-to-site VPN. Practically, no because of the middle boxes issue Core isolation PortLand SEC2 VSITE Migration Live migration thanks to gratuitous ARP. Possibility of lessening the number of lost packets with redirection. Live migration thanks to gratuitous ARP. Live migration if the VM stays in the same location otherwise offline migration. Resilience Fat-tree topology induced resilience. Fabric manager back up even with slightly non identical information. Scalability Centralized controller. Huge stress on Fabric manager. Not scalable by default: ≈ 27000 hosts with 25 ARP request/second = Fabric Manager with 15 CPUs. Multi data center Not specified. Layer 2 Fat-tree topology to interconnect the core switches of each data centers and get one network spanning over multiple DC. Not really feasible in reality seeing the cost induced by the interconnection topology. 41 The author has requested enhancement of the downloaded file. All in-text references underlined in blue are linked to publications on ResearchGate. Table 5: Comparison of the Core isolation protocols Multiple FEs. Backups of Central Controller (CC). Can uses a Distributed controller instead of CC. Centralized controller. Huge stress on Centralized Controller. Possibility to transform the CC in a Distributed Controller. Only 4096 VLANs by edge domain. Number of edge domain is not limited but depend on MAC address usage. Master/slave switches configuration with the virtual router redundancy protocol. Multi domains by design but scalability issue, only 4096 VLANs per domain. VLAN for isolation. Aggregation of multiple VMs under a locIP. Multiple VMs MAC addresses aggregated under one RBridge nickname. VNI TAG (24 bits) allows for 16777216 virtual networks. Multi sites by design but scalability issue, only 4096 VLANs per sites. Ready for multi data center. When using TRILL it creates one big network with one control plane spanning over all the data centers. Whereas MLTP keeps each data center independent. Needs Layer 2 tunnels between data centers. 802.1ad Need to allocate resources for the VLAN in the destination network ahead of time to have session continuity. Migration restricted to the same Layer 2 network. 802.1ah Need to allocate resources for the VLAN in the destination network ahead of time to have session continuity. Migration restricted to the same Layer 2 network. Resilience Link aggregation and switches redundancy. Link aggregation and switches redundancy Scalability Multi data center VLAN limit up to 16777216 VLAN limit up to ≈ 7 × 1016 Yes with the same VLANs on all data center. Yes with the same VLANs on all data center. Core isolation Migration VNT Live migration. Based on TRILL or MLTP which uses RBridge nicknames for forwarding the messages so VM’s IP or MAC address unchanged. ECMP for multipath. Redundant multicast distribution tree. No centralized controller. VXLAN Need to allocate resources for the VXLAN in the destination network ahead of time to have session continuity. Migration across Layer 3 network possible. VTEP in hypervisor so redundancy of server in order to migrate the VMs to a new server if the hypervisor is down. 16777216 VNI possible. Possible to use VXLAN as a site-to-site VPN with VTEP gateways.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project