On the Scalability of the Controller in Software

On the Scalability of the Controller in Software
On the Scalability of the Controller
in Software-Defined Networking
Agirman Reber
Faculty of Engineering
University of Liège
A thesis submitted for the degree of
MSc in Computer Science
Academic year 2014-2015
Abstract
Software-defined networking (SDN) proposes a new approach in modern
computer networks. The cornerstone idea it introduces consists of decoupling the control plane (i.e., the logic responsible of forwarding decisions)
from the data plane (i.e., set of devices that actually forward data). Thus,
instead of a distributed control design, forwarding decisions are exclusively
managed by the logically centralized controller. This abstraction enables
layer-independent evolution on both planes and more importantly, it provides a finer-grained control on the network. One of the main concerns
with SDN is its ability to scale as the network grows. This thesis aims
at motivating the implementation of MPLS fast reroute to improve the
controller scalability.
Acknowledgements
I would like to express my sincere gratitude to my thesis supervisor, Professor Benoit Donnet, for his immense support and guidance throughout
the year. Our weekly meetings played an important role in my continuous
progress and helped me to structure my work schedule.
It is with great pleasure that I take this opportunity to also thank my
fellow classmates. The different group activities we shared together gave
me moments of tranquillity and helped refocus back on my study.
Finally, I deeply thank my parents, my sisters, and my elder brother for
their endless support and unwavering love. Whenever I felt dim and tired,
I could always count on their heart-warming encouragements. At last, I
couldn’t possibly forget to express my heartfelt gratitude to my cousin,
Daniel. I am eternally in his debt for his wise advice and thoughtful
comments.
Contents
1 Software-Defined Networks
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1
1
Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2.1
Network programmability . . . . . . . . . . . . . . . . . . . .
2
1.2.2
1.2.3
Traffic engineering . . . . . . . . . . . . . . . . . . . . . . . .
Management tasks . . . . . . . . . . . . . . . . . . . . . . . .
3
3
Background and Limitations . . . . . . . . . . . . . . . . . . . . . . .
4
1.3.1
Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3.2
1.3.3
Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Middlebox . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
5
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.4.1
. . . . . . . . . . . . . . . . . . . . . . .
6
1.4.2 Complex policies . . . . . . . . . . . . . . . . . . . . . . . . .
OpenFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
7
1.5.1
Flow tables . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Premise of SDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.6.1
1.6.2
ForCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MBone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
9
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.7.1
Wide-Area Network . . . . . . . . . . . . . . . . . . . . . . . .
10
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8.1 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
10
1.8.2
Controller placement . . . . . . . . . . . . . . . . . . . . . . .
11
1.8.3
Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Network ossification
i
2 Problem Statement
2.1
2.2
12
Main Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.1.1
2.1.2
Controller overhead . . . . . . . . . . . . . . . . . . . . . . . .
Load distribution . . . . . . . . . . . . . . . . . . . . . . . . .
12
13
Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.2.1
Distributed model . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.2.1.1
2.2.1.2
Onix . . . . . . . . . . . . . . . . . . . . . . . . . . .
Kandoo . . . . . . . . . . . . . . . . . . . . . . . . .
14
15
2.2.1.3
Advantages and Drawbacks . . . . . . . . . . . . . .
16
Centralized model . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2.2.1
2.2.2.2
Devoflow . . . . . . . . . . . . . . . . . . . . . . . .
Maestro . . . . . . . . . . . . . . . . . . . . . . . . .
16
17
2.2.2.3
Advantages and Drawbacks . . . . . . . . . . . . . .
17
2.2.2
3 MPLS Fast Reroute
18
3.1
3.2
Current Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multiprotocol Label Switching . . . . . . . . . . . . . . . . . . . . . .
18
19
3.3
Fast Reroute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
4 Methodology
4.1
4.2
21
Implementation Platform . . . . . . . . . . . . . . . . . . . . . . . . .
ns-3 OpenFlow Module . . . . . . . . . . . . . . . . . . . . . . . . .
21
22
4.2.1
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.2.2
Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
Quantifying Traffic During Failure . . . . . . . . . . . . . . . . . . . .
4.3.1 Topology model . . . . . . . . . . . . . . . . . . . . . . . . . .
23
24
4.3.2
Characterizing traffic . . . . . . . . . . . . . . . . . . . . . . .
25
4.3.3
Traffic generation . . . . . . . . . . . . . . . . . . . . . . . . .
25
5 Measurements Outlook
5.0.4 I/O traffic rate . . . . . . . . . . . . . . . . . . . . . . . . . .
27
29
4.3
5.0.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
5.1
Applying MPLS Fast Reroute . . . . . . . . . . . . . . . . . . . . . .
30
5.2
One-to-one . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Backup path generation . . . . . . . . . . . . . . . . . . . . .
31
32
5.2.2
Backup path selection . . . . . . . . . . . . . . . . . . . . . .
32
5.2.3
Implementation adjustments . . . . . . . . . . . . . . . . . . .
32
ii
5.3
5.2.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
5.2.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Many-to-one . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 FRR table . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
37
5.3.2
Backup path generation . . . . . . . . . . . . . . . . . . . . .
38
5.3.3
Backup path selection . . . . . . . . . . . . . . . . . . . . . .
38
5.3.4
5.3.5
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
40
6 Refined Topology Model
41
6.1
Major upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
6.2
6.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
45
7 Conclusions
7.1
7.2
46
Main Lessons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
7.1.1
7.1.2
One-to-one . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Many-to-one . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
47
7.1.3
Applicability . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . .
48
7.2.1
7.2.2
The controller . . . . . . . . . . . . . . . . . . . . . . . . . . .
Topology model . . . . . . . . . . . . . . . . . . . . . . . . . .
48
48
7.2.3
Traffic model . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
7.2.4
Failure model . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
A UML Class Diagram
50
Bibliography
52
iii
Chapter 1
Software-Defined Networks
1.1
Introduction
Software-Defined Networking (SDN) is a new and promising computer-networking
paradigm that galvanizes both scientific and industry communities. Conceptually,
the key idea is to dissociate the control plane, the logic that controls forwarding
decisions, from the data plane in modern telecommunication device.
Figure 1.1: SDN Architechture
Formally, the data plane represents the part of the network infrastructure responsible of transmitting data across a set of forwarding devices that are interconnected
through physical or wireless channels.
Figure 1.1 illustrates the typical architecture of a SDN. This architecture is made
of two main components: the controller and the forwarding devices. The controller,
commonly referred as the network operating system (NOS) [1], is a logically centralized entity that manages the network. The forwarding devices are solely responsible
of forwarding data. The network wide-view controller aims at facilitating innovation and simplify management tasks by providing a standard vendor-agnostic entry
point to the network. In essence, Software-Defined Networking is dusting off the path
1
to programmable networks. This new approach will impact current architecture on
different aspects:
• As mentioned earlier, SDN paradigm introduces the concept of network controller or operating system. User-defined application access to the controller
will be granted trough an API using high-level policies which will then be translated into low-level rules. The desired goal of this architecture is to simplify the
development process of new protocols and applications.
• Forwarding decisions are flow-based. Similarly to a firewall, each forwarding
device has a flow table that regroups the set of matching rules. Flows that
match a particular flow rule will be managed in the same way.
1.2
Motivations
In the early 1980s, the OSI model depicting an approach by abstract layers was
adopted in the network communication field. This model solved the proprietaryrelated issues in heterogeneous networks and allowed layer-independent innovation.
The proclaimed purpose of SDN is to reinstate this abstract-based approach into
networking management in order to facilitate the development of new protocols and
boost creative initiatives at the same time.
1.2.1
Network programmability
Software-defined networking has various advantages among which, network programmability. As explained in the first section of this thesis, SDN reinstates the abstractbased approach in computer networks. This abstraction operates on three levels:
• Forwarding: Forwarding rules transmission should be vendor-agnostic.
• Distribution: Control is logically centralized and as discussed later in this chapter, several tactics may be implemented to avoid the single-point of failure issue.
• Specification: Network applications express a desired behavior through a standard API, alleviating from the implementation burden.
The latter is referred as the northbound API and serves as a common interface for
network application developers. Several work groups have emerged with their own
solution such as Frenetic [2]. Their work focused on network monitoring, policies,
and configuration.
2
1.2.2
Traffic engineering
Several traffic engineering applications have been proposed, including ElasticTree [3].
The main goal of most applications is to engineer traffic with the aim of minimizing
power consumption, maximizing aggregate network utilization, providing optimized
load-balancing, and other generic traffic optimization techniques.
For instance, network devices that perform load-balancing at the data plane layer
often rely on pre-configured policies and rules inherent to the surrounding topology
[4]. They often produce adequate results but unfortunately they represent a bottleneck in the data traffic when we enlarge the number of network nodes [4]. Different
algorithms and techniques have been proposed for this purpose. A technique to allow
this type of applications to scale is to use wildcard-based rules to perform proactive
load-balancing [5].
1.2.3
Management tasks
Over the last decades, networks infrastructures have drastically increased in size and
complexity [6]. Coupled with the wide range of equipment vendors and subsequent
architectures, network management tasks are often laborious. The introduction of
Simple Network Management Protocol (SNMP) [7] aimed at circumventing this issue
with a standardized API, allowing the network operator to monitor several devices
(agents) by retrieving a chosen set of data from the Management Information Bases
(MIB) stored on the target agent. Unfortunately, discussion reported in RFC 3535
[8] mentioned a series of issues such as:
• Vendor-specific Command Line Interfaces (CLIs) disparity that undermines
standard SNMP solutions
• Performance issues
• Unpractical low-level language
In addition to SNMP metrics, network operators often rely on different debugging
tools such as ping, netstat, traceroute or nmap to analyse or diagnose possible failures. While these tools may be sufficient in relatively small networks, they tend to
be old-fashioned or completely archaic in dense network infrastructures [9]. The rise
of software-defined networking reshaped the landscape in this domain.
3
Abstraction from the data plane hardware-specifics can potentially make debugging
and troubleshooting easier. For instance, ndb [9] a debugger tool for OpenFlow-based
switches facilitates faulty network diagnosis.
1.3
Background and Limitations
Prior explaining what urged the need to programmable networks, a brief outline on
elementary network devices is presented. As the SDN revolution is occurring on
the control and data plane, this section is focused on network and data link layer
components.
1.3.1
Router
A router is a network device operating at the network layer (Layer-3) that routes
packets onto the desired itinerary, hence the name. Conceptually, a router can be
seen as a software-based or hardware component that manages a routing table. This
table lists the route and physical interface on which a specific packet will be forwarded.
Intra-domain routing decisions are derived from two major algorithms:
• Open Shortest Path First (OSPF) [10]: The loop-free routing tree is created
using the Dijkstra shortest path algorithm [11]. Different metrics are taken into
account when building the routing tree:
1. The available link bandwidth
2. The link reliability
3. The distance (Round-Trip-Time)
In case of a link failure or any event impacting the existing topology, the routing
tree is automatically updated.
• Routing Information Protocol (RIP) [12]: It belongs to the class of distancevector routing protocols where a weighting cost is assigned to each link in
the network. Hop-count, which measures the number of intermediates device
through which an IP datagram will transit, is the sole metric employed when
building the routing tree.
4
1.3.2
Switch
A switch is a network device, generally operating at data link layer (Layer-2) that
forwards incoming packets datagram onto one of their output ports. In practice, when
a datagram reaches the switch, it will first look up its Content Addressable Memory
(CAM) in order to find the corresponding (MAC address - port number) pair.
PORT #
@MAC
TYPE
1
aaaa.bbbb.cccc.dddd
Static
2
bbbb.aaaa.cccc.dddd
Dynamic
3
aaaa.bbbb.dddd.cccc
Static
Table 1.1: CAM table
If no matches can be found, the switch will proceed by flooding the ”packet” on
every single port. The spanning tree protocol is employed to prevent network loops.
1.3.3
Middlebox
A middlebox is a an intermediary box performing functions apart from normal, standard functions of an IP router on the data path between a source host and destination
host [13]. They include, among others:
• Firewall: A hardware or software-based device that shields a local network
from an uncontrolled and external network by filtering incoming and outgoing
packets. In practice, when configuring the firewall, a network manager will
have to list the set of functionalities and services that are granted access to the
Internet.
• Network address translator (NAT): Dissimulating internal hosts IP addresses
to one exposed IP address.
• Intrusion Detection System (IDS): Monitors the internal network activity and
produces reports if a malicious event is detected.
5
1.4
1.4.1
Limitations
Network ossification
The dynamical nature of users demand is constantly forcing IT managers to adapt the
network infrastructure, develop new protocols and re-configure firewalls. In today’s
architecture, the tight coupling between the data and control planes is the root origins
of network ossification [14]. As a result, innovation and even simple management
task tend to be time consuming, IT operators need to take into consideration several
vendor-specific parameters. Additionally, the fast increasing presence of middleboxes
adds yet another layer of complexity in networks architecture. While interfering with
IP packets, they impede the deployment of new protocols. The long and complicated
deployment of IP multicast and IPv6 pinpoints this hidden reality in modern network
computing [15].
1.4.2
Complex policies
Configuring policies in heterogeneous network topologies is a serious challenge for IT
operators and in extent, a time consuming effort. Coupled with the fast evolving users
demand, the task requires a serious set of skills to master complexity. Vendors are
taking advantage of this situation to develop specialized network equipment offering
a convenient solution to network managers. As a result, the network is currently filled
with complex hardware devices, produced by a myriad of network equipment vendors.
SDN paradigm revolutionizes the current state of affairs by establishing a centralized logic node with a wide-view of the network and providing specification abstraction. Interactions with the controller are done through a standard API, translating
a desired behavior into low-level vendor-agnostic instructions in the data plane.
6
1.5
OpenFlow
Founded by the Open Network Foundation (ONF), OpenFlow [16] is an open southbound interface, which specifies the communication protocol (e.g., forwarding rules
transmission, table modification) between the control plane and the forwarding devices in SDN architecture.
1.5.1
Flow tables
An OpenFlow enabled switch contains a flow table that hold several information such
as:
• Matching fields: Packet header, ingress port or MAC address
• Actions: Forward on a specific port, drop, etc.
• Counters: Number of packets/bytes received and other metrics.
Similarly to a firewall, Table 1.2 depicts the general view of a Flow Table. Once
a packet is received, the OpenFlow switch will apply the corresponding actions if a
match has been found. In the case of a match miss, the behavior of the switch will
depend on table-miss instruction (e.g., forward to the controller).
ACTION
RULE
STATS
Forward on port 1
IP
src - dst - prot
# packets Tx/Rx
Drop
ETH
src - dst - type
# packets Tx/Rx
Forward to controller
TCP
src port - dst port
# packets Tx/Rx
Table 1.2: Example of OpenFlow switch flow table
OpenFlow is now considered as the most wide-spread southbound API and has
already been adopted by various network equipment vendors (e.g., IBM [17], Brocade
[18], and HP [19]) across different network environments such as data centers and service providers. Numerous projects have derived new versions of OpenFlow, enhancing
the flow table optimization and overall bandwidth consumption. Devoflow [20] in particular portrays a longevity-based flow repartition where OpenFlow switches would
only manage short-lived flows.
7
1.6
Premise of SDN
Albeit the concept recently gained popularity, the idea of a programmable network
has emerged during the early 2000s. The following table gives a sneak peek on notable
projects developed in the past that forged the path to software-defined networking:
Category
Pre-SDN initiatives
Data plane programmability
xbind, IEEE PI520, smart packets, ANTS,
SwitchWare, Calvert, high performance router,
NetScript, Tennenhouse
Control and data plane
decoupling
NCP, GSMP, Tempest, ForCES, RCP, SoftRouter, PCE, 3D, IRSCP
Network virtualization
Tempest, MBone, 6Bone, RON, Planet Lab, Impasse, GENI, VINI
Table 1.3: Previous projects on programmable networks [21]
1.6.1
ForCES
The Forwarding and Control Element (ForCES) protocol [22] developed by the Internet Engineering Task Force (IETF) is the precursor of OpenFlow. Similarly, it
introduces the concepts of:
• Forwarding Elements (FEs): an entity that implements the ForCES protocol
and provides per-packing processing.
• Control Elements (CEs): an entity that instructs several FEs on how to process
packets.
• Logical Function Block: a functional block stored in FE and controlled by the
CE to dictate packets processing and configure FE.
However, unlike OpenFlow, the CEs and FEs are not necessarily separated into
two distinct network entities. The ForCES protocol specification described in RFC
5810 [22] mentions security mechanisms such as message authentication and CEs
redundancy to prevent risk of failures. ForCES aimed to facilitate intercommunication
with network devices with different operating system. Unfortunately, it failed, as most
network equipment vendors did not adopt it.
8
1.6.2
MBone
Multicast Backbone (MBone) was developed in the early 1990s and aimed at providing
efficient audio and video broadcast services across the Internet [23]. It was one of
the early initiatives that solved IP multicast routing in WANs by setting up virtual
network topologies on top of legacy networks. As depicted in Figure 1.2, MBones
introduced the concept of IP tunneling where multicast IP packets are encapsulated
with unicast IP header prior reaching unicast routers.
Figure 1.2: IP tunneling in MBone
However, the protocol faced several challenges related to dynamic changes in multicast group members or network conditions. Despite these performance and scalability concerns, MBone gained tremendous success and by the end of 1990s most
routers could support IP multicast. The idea of IP tunneling served as a model to
the 6Bone project, an early IPv6 deployment phase and developed by the IETF IPng
work-group [24].
9
1.7
1.7.1
Applications
Wide-Area Network
Software-defined networking can be applied in Wide-Area Networks (WANs) with a
distributed controller design to obtain a resilient and scalable architecture. Onix has
been developed in this effort, providing a distributed control platform for large-scale
production networks [25]. Aster*x, another project that proposes a load-balancing
system for WANs is presented in [26]. The projects are also suited for data centers
where typical topologies contain thousands of network devices and require high flow
reactivity.
1.8
Challenges
1.8.1
Scalability
The first issue that comes into consideration is the tedious standardization task. Networking device vendors have built their own architecture and specifications, thus,
forging a standard API that could satisfy each vendor-specific requirements is a nontrivial task.
As mentioned in the introductory section, the decoupling between the data plane
and control plane is the core characteristic in software-defined networking paradigm.
Instantly, network researchers raised scalability concerns, as the centralized logic control naturally appears as a bottleneck in large networks. In fact, Handigol et al. [27]
magnified those concerns with an early benchmark analysis, highlighting the fact that
the controller was able to manage 30,000 flow-setup per second with a 10 ms install
delay. Thus, adopting SDN approach in wide-area networks such as data centers,
where network traffic intensity is enormous, represents a major challenge. Additionally, apart from the flow per second management, one could argue that the required
memory space for the flow table poses a threat to the scaling capabilities of SDN.
A solution has been proposed in [28] where they depict a mechanism that reduces the
control plane overhead by storing a shrunk version of the flow table in the controller.
10
1.8.2
Controller placement
The controller or network operating system placement is actually a prominent research
and discussion topic in software-defined networking meetings. Whether the position
is fixed or dynamic, a fine-tuning, tailored to the network environment is required.
Heller et al. [29] discuss various placement metrics and their importance based on
the surrounding network layout:
• Average-case latency: The objective is to determine the placement that will
minimize the average latency from the controller to every node (it corresponds
to a minimum k-median optimization problem. The goal is to find the k centers
that minimizes the sum of distances between a set of nodes and the k centers
[29])
• Worst-case latency: In this case, the goal is to find the placement that minimizes the propagation delay for a maximum number of nodes (corresponds to
minimum k-center optimization problem that aims to partition a set of nodes
into k clusters. A specific node is affected to the cluster with the closest mean
[29])
The paper also studies the placement effect using these metrics.
1.8.3
Resilience
Different papers have tackled the various limitations inherent to a centralized logic
device architecture. The main issue is the single-point of failure introduced with
such a paradigm. A centralized design offers a high rate efficiency in term of data
throughput across the network [30]. Though, as the surrounding environment of the
controller may evolve, one may prefer to trade efficiency with higher resiliency. Onix,
a distributed controller architecture has been proposed by Koponen et al. [25]. Their
studies reveal that the architecture is able to provide a higher degree of resiliency.
Compared to other distributed controllers, Onix ensures a strong data consistency,
which means that every single controller will obtain the most recent value in the flow
table after a modification has been made.
11
Chapter 2
Problem Statement
2.1
Main Objective
This thesis addresses the scalability issues that emerge with a centralized controller
model in an OpenFlow SDN environment. As explained in the previous chapter,
decoupling the control logic from the data plane enables programmability, vendorindependent policies and various other advantages. However, since its inception, SDN
is subject to a wide range of criticism mainly related to the controller performance
and scalability issues.
2.1.1
Controller overhead
The OpenFlow protocol specification [31] states that a forwarding device can send
asynchronous messages to the controller once a change in the network state is detected.
Packet-in messages represent one of the main asynchronous message, which are, in
practice, triggered in two situations:
1) When a table miss is detected. The packet-in message is then flagged with
OFPR NO MATCH and in the default configuration, sent to the controller.
2) When the action field in the switch’s flow table explicitly specify to send the
packet to the controller. In this case, the OFPR ACTION flag is employed.
Once a change in the topology occurs (e.g., a forwarding device is added), the controller might be overloaded with thousands of packet-in messages in a short amount
of time. Tavakoli et al. [27] reported a 2 million flows/s generation rate in their largest
data center topology. Similarly, once a forwarding device collapses, the affected table
entries for each flow will be removed (due to their idle timeout parameter), flowexpired messages will be sent to controller (in the default setup).
12
Figure 2.1: Schematic view of the first problem
Figure 2.1 portrays a high level view of the situation. In this thesis, the objective
is to implement and evaluate a scalable mechanism to reduce switch-to-controller
contention.
2.1.2
Load distribution
In this subsidiary problem, the objective is to determine which approach should be
adopted at the control plane level regarding traffic load distribution after a topology
change (e.g., entrance of a new forwarding device). Figure 2.2 presents a schematic
representation of the situation.
Figure 2.2: Schematic view of the second problem
Increasing the number of controllers is a rather natural and straightforward solution. Analogously to a multicore computer, a fine-tuned synchronization and loadbalancing policy has to be developed in this multi-controller architecture. Also, apart
from the technical aspect, the financial cost related to the controller pool might be a
deterring factor.
13
2.2
Literature Review
As discussed in the Challenges section of the first chapter, several scientific papers
have been published regarding the scalability issues and possible improvements of
SDN paradigm. Broadly, two major approaches can be distinguished: a distributed
controller architecture and a centralized model. In the distributed setup, the control
plane spans over two or more controller nodes. A given controller can be in charge
of a specific network area. Depending on the controller layout model (e.g., horizontal
or hierarchical), the distributed approach requires a standardized East/West API
(in the case of a horizontal distribution) or a North/South API (hierarchical model)
regarding inter-controller communications. In the centralized model, the control plane
can be freely scattered among multiple nodes as long as SDN philosophy, portraying
a network wide-view and globally visible controller, is respected.
2.2.1
Distributed model
2.2.1.1
Onix
Onix [25] is defined as a production-quality platform on top of which the control plane
can be implemented as a distributed system. This work derives from previous research
projects, namely Ethane [32] (the first instance of an OpenFlow controller) and NOX
[27], from which they expand the northbound API expressiveness, facilitating the
development of scalable network applications. In addition, they introduce the concept
of Network Information Base (NIB), the data structure that gathers the network state
information. Figure 2.3 illustrates the high-level view of Onix architecture.
Figure 2.3: Onix architecture
14
East/West API The NIB is replicated and distributed among all Onix nodes
to ensure scalability, resiliency and consistency. The authors specify that as part
of their scalability strategies, Onix controllers are given the possibility to maintain
only a subset of the NIB up-to-date. Furthermore, similarly to ATM networks, Onix
facilitates the aggregation of a group of controllers into a single logical Onix instance,
thus, limiting overall inter-controller traffic in hierarchical setups.
2.2.1.2
Kandoo
Kandoo [33] has also adopted a distributed approach at the control plane level. The
notable difference compared to Onix is its control plane layout design. As depicted
in Figure 2.4, Kandoo introduces a two-level hierarchical model where bottom layer
controllers are isolated from their surrounding neighbors. Although deprived from a
network-wide view, they cater a suitable control platform for locally scoped network
applications (e.g., Learning Switch, Link Layer Discovery Protocol). On the other
hand, the top layer controller is responsible to maintain the network-wide state and
accordingly serve the related network applications (e.g., Load Distribution).
Figure 2.4: Kandoo distributed model
Their evaluation results showed that compared to a traditional OpenFlow controller,
the bandwidth and messages traffic load was effectively reduced by an order of magnitude. Additionally, the authors insist that further improvement can be obtained at
the root controller level. Kandoo imposes a logically centralized controller, hence in
practice, it can be physically distributed as in Onix.
15
2.2.1.3
Advantages and Drawbacks
Whether we are in a hierarchical or a horizontal system, the distributed model
presents several advantages and drawbacks. It appears that the scalability comes
at the expense of a weaker consistency. Once a change in network state occurs, there
could be several control nodes with conflicting information. Onix offers two NIB update mechanisms, letting application developers to make their own trade-off between
consistency and performance. Along with the scalability, one might argue that a distributed approach resolves resiliency issues in centralized models. Finally, regarding
the financial aspect of a multi-controller model, Curtis et al. [3] have presented a
migration protocol to dynamically shift a forwarding device into a controller. The
migration is triggered when several chosen traffic parameters reach a certain threshold
level.
2.2.2
Centralized model
2.2.2.1
Devoflow
DevoFlow [20] is a variant of the OpenFlow model aspiring at fulfilling the scalability
requirements of dense networks architectures. The authors claim that OpenFlow
current design is not suited for high-performance networks, the fined-grained flow
control being pointed out as the root cause. Consequently, they propose to slightly
bend the control plane - data plane decoupling principle in order to effectively relieve
the controller from excessive per-flow manipulations. Specifically, as shown in Figure
2.5, their approach consists of segregating short-lived flows management from elephant
flows (i.e., signifcant flows).
Figure 2.5: Flow segregation in Devoflow (a given flow is tagged as “elephant” as
soon as it has transferred at least a threshold number of bytes (∼ 1-10MB))
The data plane will inherit the management task of short-lived flows as it generally represents a large proportion of the switch-to-controller traffic load. Statistics
collection mechanisms are also handled at the data plane level preventing controller
overhead caused by pull or push-based statistics retrieval.
16
2.2.2.2
Maestro
Maestro [34] is an OpenFlow-compliant controller design that leverages multi-threading
to boost its performance. Emphasizing the critical aspects of a centralized controller
design (e.g., scalability and performance), the authors argue that parallel computing
could amend SDN controller’s efficiency. While it exploits recent advances in parallel processing, Maestro’s controller design maintains a single-threaded programming
model for usability reasons. Network developers can feed single-threaded application
to Maestro controller and expect them to be executed concurrently with near linear
scalability.
2.2.2.3
Advantages and Drawbacks
As in the distributed model, the centralized architecture has its own positive and
negative facets. First and foremost, despite the various optimizations efforts, a centralized control logic remains subject to the single-point of failure issue. Needless
to mention the harmful consequences that may occur during a controller failure in
a dense network. Additionally, as the network expands both in size and space, the
centralized model will inevitably encounter several limitations.
17
Chapter 3
MPLS Fast Reroute
As previously explained, the intent of this thesis is to implement a scalable mechanism to reduce controller overhead provoked by switch-to-controller messages. In this
perspective, a possible improvement in the current OpenFlow (OF) communication
model can be found in Multiprotocol Label Switching (MPLS) protocol [35]. More
precisely, the fast reroute (or local repair ) mechanism in MPLS offers a valid starting
point to achieve the desired scalability property. In short, this mechanism enables
switching nodes to instantly redirect traffic in advent of node or link failures without
having to re-compute a backup path. Implementing fast reroute in OF-based networks requires the controller to proactively forward backup rules on each switching
node. Thus, this mechanism can potentially introduce an additional load on the controller and as such, requires a careful deployment. In this chapter, a base study case
will be modeled in ns-3 in order to motivate the need of a failure recovery mechanism.
3.1
Current Situation
According to OpenFlow specification v0.8.9 [31], once a forwarding device detects a
status modification on one of its links, an ofpt-port-status message is sent to the
controller. Briefly, this message contains different information such as the type of
event that triggered this status change, the affected port number and its new state.
The controller is then responsible to re-route each flow that was transiting through
that port. In practice, this process is done through flow-mods messages generated
by the controller and sent to each forwarding device on the updated route. In the
case of a dense network, one can immediately infer that this fault recovery model
is subject to limitations, as it potentially requires the controller to handle a large
amount of messages in a short time frame. Links and nodes failures are considered as
rare events [36], however, several other situations can benefit the deployment of a fast
18
recovery mechanism. For instance, network applications might require QoS to cope
with users demand (e.g., real-time applications such as voice over IP). If the network
is subject to congestion, the controller is expected to adequately re-route each flow
with respect to its QoS requirements and if this process is not managed proactively,
the control plane becomes a bottleneck.
3.2
Multiprotocol Label Switching
MPLS (RFC 3031 [35]) is a network protocol developed by Cisco researchers in the late
1990s. The primary goal of MPLS is to improve forwarding performances while maintaining traffic-engineering capabilities. Although the magnitude of its deployment is
still under study, MPLS protocol is increasingly more present in today’s networks.
Since IP packets are encapsulated when entering an MPLS tunnel, this preempted
the use of conventional topology discovery techniques that relied on the ip-ttl field.
Exploiting two MPLS debugging features (RFC 4950 [37] and ttl-propagate), Donnet
et al. [38] developed inference techniques to reveal the presence of MPLS tunnels.
Conceptually, implementing MPLS implies two adjustments:
• In a classic IP-based network, packets are forwarded based on a specific Forwarding Equivalent Class 1 (FEC) which can differ from one router to another.
Upon entering an MPLS network, packets are assigned a label (a 20-bits value)
that corresponds to a specific FEC. This assignment is performed by the Ingress
Hop Router (i.e., MPLS tunnel entry point) and is done only once (assuming
that MPLS Label Distribution Protocol [39] is in place). Though, in practice,
labels can be switched along the Label Switching Path (LSP).
• After the Ingress Hop Router, forwarding decision is no longer based on the
network layer header, thus avoiding the time-consuming longest prefix match
mechanism. Instead, packets are now forwarded based on their label, which
represents the index into switches forwarding tables.
1
Forwarding policy for a group of packets
19
3.3
Fast Reroute
MPLS fast reroute (RFC 4090 [40]) is an extension of Resource Reservation Protocol
Traffic Engineering (RSVP-TE) protocol (RFC 3209 [41]) introduced with MPLS. It
presents a mechanism to quickly re-direct traffic onto a backup LSP once the primary
LSP is broken (e.g., node or link failure). This reactiveness is obtained by computing
and signaling backup LSPs prior any node (or link) failures. This mechanism is
particularly useful for real-time applications such as Voice over IP (VoIP) where even
a slight increase of delay can degrade users communication flow. Figure 3.1 portrays
a high-level view of MPLS fast reroute.
Figure 3.1: MPLS fast reroute (or local repair) mechanism. In this scheme, the
protection covers possible failures of node A or B
As we can see, the backup LSP can cover either a node or a link failure. Ideally, the
backup LSP should be very close the primary LSP so that packets can be sent back
onto it whenever possible. It is believed that MPLS fast reroute can relieve the switchto-controller traffic strain in advent of a failure, as backup paths are immediately
available. In order to validate this argument, a general case study will be modeled in
ns-3. The goal is to measure the traffic load on the controller during a node failure.
20
Chapter 4
Methodology
4.1
Implementation Platform
The framework of this research project is built on ns-3 [42], the open-source and globally renowned network simulator. Funded by the Natural Science Foundation (NSF)
and released in 2008, ns-3 has become one of the most popular simulation software
in the networking research community. Different versions have been released over the
past years as the software is always expanding in terms of capabilities and modules.
In the context of this research project, the version 3.19 of ns-3 has been installed. All
tests are executed on Ubuntu OS v13.10 (GNU/Linux 3.11.0-12-generic i686), running on Oracle Virtual Machine software, VirtualBox v4.2.6 (revision 82870). The
host machine is a 2,26 GHz Intel dual-core Macbook Unibody, operating on Mac OS
X Snow Leopard (v10.6.8).
Note: A compilation error might occur during the installation process of ns-3 Openflow module on Ubuntu 13.10. The boost libraries path in the configure script refers to
/usr/lib64 instead of the actual boost libraries which are located in /usr/lib/x86 64linux-gnu/. To solve this issue, one can run the configure process as follows: ./configure –with-boost-libdir=/usr/lib/x86 64-linux-gnu. Another fix is to install (using
apt-get) the libboost-all-dev v1.54 package.
21
4.2
ns-3 OpenFlow Module
The OpenFlow software implementation distribution (OFSID) used with ns-3 is the
one developed by Ericsson researchers [43] and complies with OpenFlow switch specification v0.8.9. The OpenFlowSwitchNetDevice class (that models the actual OpenFlow (OF) switch object) is provided along an OpenFlowSwitchHelper class. The
latter is employed to facilitate the installation process on forwarding nodes. Their
OFSID has also MPLS capabilities implemented on the OpenFlowSwitchNetDevice
class, though, this will be irrelevant to the purpose of this research project.
4.2.1
Model
Similarly to an OF switch, an OpenFlowSwitchNetDevice class object contains a
flow table that can be modified by the controller to update its forwarding rules.
Naturally, the communication protocol complies with OpenFlow switch specification.
The OFSID provides two basic types of controllers:
• Drop Controller: As the name suggests, this type of controller will build a
flow rule for each new flow detected with a drop action. The DropController
class serves as a basic demonstration of the flow table implementation.
• Learning Controller: Here, the controller role is more elaborated. As a
packet-in is received, the controller will effectively turn the emitting forwarding
device into a Layer 2 MAC Learning switch.
4.2.2
Caveats
As in any simulation model, the ns-3 OpenFlow module has several limitations in
terms of functionality and realism. Although they will not have a critical impact on
the effectiveness of this research, it can be interesting to mention some of them:
• The OpenFlowSwitchNetDevice class has to be attached to an internal controller (e.g., either a drop or learning controller provided under the ofi namespace). The documentation page [44] states that a future release could facilitate
the use of an external controller through an EmuNetDevice class object. This
class enables packets to be exchanged between a simulation node and a real
network device.
22
• The SSL connection between OF switches and the controller is not modeled.
As mentioned in the documentation pages, the connection between OF switches
and the controller being local, a channel or link failure is unlikely to happen.
As part of this research, only inter-switches links failures will considered, thus,
fault recovery for the secure channel is out of scope.
• The last notable limitation is related to the IEEE 802.1D MAC Bridges standard. The LearningController class does not implement the Spanning Tree
Protocol thus forcing users to be careful while building their topology.
4.3
Quantifying Traffic During Failure
Different approaches can be adopted to effectively measure traffic in terms of packets
exchanges, delay or bandwidth usage. In this case, since the OpenFlow module does
not model the “physical” switch-to-controller link, the delays and bandwidth usage
are inferred based on OF packets size. As for packets tracing, although a patch file has
been included within the OpenFlow package to expand Wireshark protocol modules,
the linking process could not be properly executed. To this end, an OpenFlowStats
class has been created to record incoming and outgoing (a)synchronous messages at
the control plane level.
The excess of traffic resulting from one or several switching node failures is inferred
as follows:
• First, a failure-less scenario is modeled in the simulator. The topology consists
of 639 switches and 128 end-hosts attached to edge nodes and lasts 20 seconds.
Statistics measurements are focused on I/O traffic at the control plane. This
simulation will serve as a comparison basis for the second scenario, where a
switch might collapse during the simulation.
• In this second model, switches are scheduled to collapse at a random time (uniformly distributed). It is important to note that only edge switches are set to
collapse as any higher level switch would result in a non-connected graph, thus,
preempting any failure recovery mechanism. A parameter, passed as argument
during the execution, indicates the desired number of failures.
23
4.3.1
Topology model
As previously explained, the Spanning Tree protocol not being modeled in this current
version of ns-3, a tree-shaped topology will be created. Employing existing topology
readers module such as RocketFuel [45] and Orbis [46] would clearly improve the
simulation realism, but sadly, this is not feasible given the OpenFlow module restrictions. Instead, an additional class has been created to serve this purpose. This class,
TopologyBuilder, will take as parameters the node degree, the maximum recursion
level and creates the corresponding OpenFlow Switch topology. An additional parameter specifies the number of end-hosts nodes to be attached at the leaves (edge
switches). Figure 4.1 shows the topology as provided by python visualizer module
in ns-3. The links are all 1 Gigabit/Ethernet with a 2ms transfer delay.
"
%""
$""
#""
!""
'$""
"
$""
!""
&""
Figure 4.1: Topology model used during the simulation (end-hosts: gray; OF switches:
green). The controller is not represented since it remains internal in the Openflow
module
Note: OF switches in ns-3 are used in pair with Carrier Sense Multiple Access
(CSMA) channels since the Openflow module does not allow any other type of links
such as point-to-point. In order to avoid useless traffic when two neighbor endhosts are establishing a connection, the TopologyBuilder class object will install
one CSMA link per end-host attached.
24
4.3.2
Characterizing traffic
Modeling realistic network traffic for the simulation purpose requires a fine-tuned
analysis based on ground-truth properties of large network infrastructures such as
data centers or campus networks. Benson et al. [47] have conducted a series of
measurements on different class of data centers to identify their intrinsic properties.
Briefly, they characterized the traffic in terms of flow generation, flow delays, application types and several other metrics. The authors claim that their study offers
valuable information to further refine traffic-engineering models and QoS policies.
Though, only a limited number of data centers were considered, thus, their observations do not necessarily apply to every type of data centers. Nevertheless, the results
drawn from their study will serve as input to the simulator traffic generator. Table
4.1 gathers the main characteristics observed during their measurements.
Data center role
Active flows
Flow arrival
Flow size
Flow duration
University
10 - 500
4 - 40ms
< 10KB
< 5s
Private
1,000 - 5,000
< 1ms
< 10KB
< 11s
Commercial
5,000 - 10,000
< 1ms
< 10KB
< 11s
Table 4.1: Data center flow characteristics [47]
During the various simulation scenarios, only private and commercial traffic types
will be modeled. Although SDN is increasingly emerging on campus networks [16],
modeling a University traffic would not be appropriate for this particular simulation
environment. Given the flow generation intensity (10 - 500 per second) and the
simulation duration (20 seconds), the measurements might not be meaningful.
4.3.3
Traffic generation
RandomTrafficGenerator, as the name suggests, is where flows generation are initiated. The class object receives as parameter a set of simulation nodes and schedules
network applications at random time. In practice, the class object requires a timer
to specify when the simulation ends, then, every X 1 seconds, a new wave of flow will
be generated. During each generation phase, two nodes are selected at random and a
UDP client-server application type is initiated. The flow characteristics will depend
on the type of network being modeled (e.g., data center or campus).
1
a random variable that follows a discrete uniform distribution
25
Although this traffic generation design provides a relatively accurate model, a
second version of this class could benefit the advances made by Ammar et al. [48]. In
their paper, they present a new tool able to generate realistic Internet traffic in ns-3.
Based on the Poisson Pareto Burst Process (PPBP), a Long-Range Dependent (LRD)
model for network traffic, it can effectively match statistical properties of real-life IP
networks. Unfortunately, this enhanced version of the traffic generator will introduce
a non-negligible overhead in terms of memory and CPU usage and is thus left for
future development.
See Appendix A for a high level view of the simulation model classes. The source
code of this project is also available at the following address:
http://tinyurl.com/tfe-sdn
26
Chapter 5
Measurements Outlook
Figures 5.1 and 5.2 gathers different measurements results drawn from the simulations. Starting with the first graph related to the first scenario (i.e., failure-free) we
can see that the total number of OF messages exchanged varies between 190, 000 and
250, 000. As expected, packet-in and flow-mods messages constitute the majority
of communications. A large amount of packet-in is generated at the beginning of
each flow initiation where an ARP resolution is triggered by the client node. This
peak of traffic is also observed for flow-mods messages since the controller has to
install a new rule to flood the corresponding packet.
1.0
0.8
cdf
0.6
0.4
Flow-mod
Packet-in
0.2
0.060 80 100 120 140 160 180 200 220
# Flow-mod and Packet-in messages (x 103)
Figure 5.1: CDF of flow-mods and packet-in recorded at the control plane in the
failure-free scenario
Proceeding with Figure 5.2(a) (failures-scenarios), it appears that in general, the
total amount of messages has nearly doubled. Unsurprisingly, this increase of traffic
is caused by packet-in and flow-mods messages. Once all surrounding switches
have updated their ports state, each active flows that were transiting on the collapsed
node is considered as a new flow and thus triggers a packet-in. The controller will
then update the switch flow table accordingly to effectively determine the new output
27
port for that flow. This tendency is concurred with the next two simulation scenarios,
where an increasing number of node failures are scheduled.
0.8
0.8
0.6
0.6
cdf
1.0
cdf
1.0
0.4
0.2
0.0
100
0.4
f=1
f=5
f=10
f=20
f=1
f=5
f=10
f=20
0.2
0.0
100
200 300 400 500 600 700
#Packet-in messages (x 103)
(a) Cdf of Packet-in messages with an increasing number of failures
200 300 400 500 600 700
#Flow-mods messages (x 103)
800
(b) Cdf of Flow-mods messages with an increasing number of failures
Figure 5.2: Cdf of ouput (flow-mods) and input (packet-in) messages recorded
at the control plane. Measurements expose a strong escalation of I/O traffic at the
control plane during failures scenarios (f=i where i is the number of failure(s))
Interestingly, although an increasing number of OpenFlow messages were reported,
it appears that this excess of traffic tends to stabilize. A rapid inspection of every
trace file generated by the OpenFlowStats class object showed that the controller
was in fact receiving ofpt-error-msg packets, as shown in Figure 5.3. OpenFlow
specification states hat this type of messages are used to notify the controller of a
problem. In this case, the error code indicates that switches were not able to modify
their flow table (ofpet-flow-mod-failed ).
1.0
0.8
cdf
0.6
0.4
f=1
f=5
f=10
f=20
0.2
0.00
500 1000 1500 2000 2500 3000 3500 4000
# Errors messages
Figure 5.3: CDF of errors messages with one failure (f=1), 5 failures (f=5), 10 failures
(f=10), and 20 failures (f=20)
Further investigation showed that this was caused by a lack of memory space.
Without going into all the details, two types of data structures have been implemented
28
in the OpenFlow module; a simple linked list and a hash table, both with about a
few hundreds of available entries. Thus, as the network graph is shrinking, the traffic
is aggregated to a small subset of active switching nodes causing their flow table to
overflow. At this point, any attempt to initiate a new client-server application will
inevitably fail. Consequently, this stabilization effect is due to a “virtual” loss of
connection during the client-server application setup.
5.0.4
I/O traffic rate
The excess of traffic during a node failure can also be expressed in terms of bandwidth usage. Table 5.1 shows the range of message types with their respective size
(in Bytes) as given by the OpenFlow specification. Although the controller is receiving ofpt-port-status messages, they have a relatively small impact on its I/O
traffic rate. They are mainly generated at the very beginning of the simulation as
TopologyBuilder is building the switch infrastructure. Once a node failure occurs,
only a limited number of ofpt-port-status messages will be generated. Consequently, the latter is left aside in this analysis.
FLOW-MOD
PACKET-IN
ERROR-MSG
PORT-STATUS
FLOW-EXP
72
20
12
64
80
Size (Bytes)
Table 5.1: OpenFlow message size [31]
Since the controller is internal, the OpenFlowSwitchNetDevice is not required to
attach any packet headers (Ethernet and IP) when forwarding a packet to the controller. Thus, an additional 34 Bytes (20 Bytes IPv4 + 14 Bytes Ethernet) is taken
into account for each incoming packet.
F=0
F=1
F=5
F=10
F=20
FLOW-MOD MB/s
1.57 (± 0.13)
1.69 (± 0.16)
3.06 (± 0.15)
5.13 (± 0.21)
5.56 (± 0.24)
PACKET-IN MB/s
0.42 (± 0.07)
0.85 (± 0.09)
1.69 (± 0.13)
2.18 (± 0.22)
2.43 (± 0.21)
Table 5.2: Mean I/O Traffic Rate at the controller level. Confidence level of 95 %
For a given simulation scenario, I/O traffic rates are based on the mean number of
corresponding packets. Table 5.2 gives the mean traffic rates in the different scenarios
with a confidence level of 95%. The traffic rates reported in this table confirms
previous observations at the controller level. As expected, flow-mods and packetin bandwidth consumption continuously increases over the different scenarios. Then,
29
passing a certain point, the bandwidth usage starts to stabilize as switches flow tables
are over-flooded. The measurements are primarily focused on flow-mods (output
traffic) and packet-in (input traffic) since they represent the clear majority among
all exchanged messages.
5.0.5
Discussion
While it appears that the simulation measurements hints opportunities for improvements in terms of failure management, the results analysis has to be placed in perspective with regards to the simulation model. The topology nature (data center
with hundreds of nodes) and its subsequent traffic characteristics are likely to produce massive amounts of packet-in and flow-mods. Obviously, modeling another
type of topology such as a simple LAN would have been of poor interest as failures
occurrences and traffic overhead are expected to be negligible. In fact, SDN is probably not destined to be globally deployed in the Internet, but rather in WANs and
data centers where it is starting to emerge ([49]).
Another point to consider is the actual failure occurrences rate in data centers.
In the simulation model, an increasing number of node failures are scheduled at a
random time without relying on any particular statistical model. Another version
of the simulation could overcome this model gap by leveraging the work of Gill et
al. [36], where they estimate and characterize network failures in data centers. They
managed to derive among others, failures probabilities and tendencies for different
types of devices such as Top of Racks switches (ToRs), aggregation switches and load
balancers.
Nonetheless, this base study motivates the implementation of MPLS fast reroute
as a possible fix to improve failure management in OF-based networks. Ideally, this
extension should be simple to manage and bring limited changes in OF specification.
As we will see in the rest of this chapter, different versions of MPLS fast reroute have
been considered, each with their own pros and cons.
5.1
Applying MPLS Fast Reroute
RFC 4090 [40] defines two failure recovery methods:
• One-to-one: In this method, each label-switched path is backed up by a
separate LSP that ultimately convergences to the original LSP after the point
of failure.
30
• Many-to-one: Referred as the facility mode, this failure recovery technique
creates a single LSP that will serve as a backup for a set of LSPs. Thus,
flows arriving at the Point of local repair 1 (PLR) from different sources can be
rerouted in bulk to the same backup link.
Both methods require (N −1) backup LSPs for a tunnel made of N nodes, though,
in the second method, a set of LSPs can be mapped to one single backup LSP. In this
thesis, both approaches will be considered.
5.2
One-to-one
The one-to-one method is definitely the most straightforward and simple mechanism
to implement. In practice, as showed in Figure 5.4, this version of MPLS fast reroute
is achieved by computing and signaling a secondary flow rule for each packet-in
(ignoring ARP packets). Naturally, an OF switch should iterate through its backup
rules only in advent of a failure. This behavior is guaranteed by decreasing the priority of backup rules. The flow table is supposed to maintain its entries ordered
beginning with the highest priority rule (lowest number). Note that Openflow specification states that the flow priority field is only relevant when employing wildcards.
Thankfully, the OF switch implementation used in this project is able to cope with
“exact match” entries with different priority fields.
Figure 5.4: One-to-one backup path signaling process. A higher priority field value
(lower number) guarantees that the regular flow stays on top. After a failure occurs,
the regular flow is removed and the switch iterates through its flow table a second
time
1
Switching node that is able to re-direct traffic
31
5.2.1
Backup path generation
The NetworkStateView class holds the network global state view and is updated
whenever a change in the topology is detected (e.g., node failure). Once the controller
receives a packet-in, the backup path is computed using the Dijkstra shortest path
algorithm [11] onto the current network state. If the backup path is valid (i.e., does
not involve a backward detour) the controller will then proceed by generating a regular
flow-mods message with a lower priority field value. Once it receives the flow-mods
message, the switch behavior remains unchanged. The flow table will adequately reorder its flow entries if it detects two “exact match” rules with distinct priority fields.
Note that the controller is limited to one backup flow per packet-in to avoid excessive
resources consumption (e.g., CPU and bandwidth).
5.2.2
Backup path selection
Once a failure occurs, the controller receives the usual ofpt-port-status message
that indicates a change in the topology. When MPLS fast reroute is not enabled, this
message is the precursor of an incoming wave of packet-in messages. This time, OF
switches simply iterate through their flow table a second time to determine if a backup
rule is available. The default behavior (i.e., packet-in generation) is triggered if no
backup rule can be found.
5.2.3
Implementation adjustments
Although this first failure recovery method showed encouraging results during the
preliminary tests, a few adjustments had to be made. First and foremost, the backup
path generation described earlier does not take into account the network traffic load.
Thus, as depicted by Figure 5.5, for any (src, dst) pair the shortest path algorithm
would give the same path. Ultimately, the controller was then confronted with an
irregular number of ofpt-flow-mod-failed messages. Recall that this error message
indicates that due to a lack of space, the corresponding OF switch was not unable to
insert any additional rules in its flow table.
32
Figure 5.5: Example of path selection with Dijkstra SPF algorithm. Any flow that
goes from A to B will systematically go through S1, S2 and S3, regardless of the network traffic state. If the path is congested, the controller will receive errors messages
Employing the extended Dijkstra algorithm [50] where each switching node is
associated with a weight corrected this issue. The latter is defined as:
P
f ∈F low(S) Bits(f )
Capability(S)
(5.1)
where Bits(f ) represents the number of bits/s of flow f processed by node S and
Capability(S) is the number of bits that S can process. Since all ns-3 OF switches
are perfectly identical in this regard, this weight formula has been adapted to only
consider the number of active flows.
The second adjustment concerns the backup rule signaling process. Given the
topology traffic characteristics (e.g., high generation rate of short-lived flows), covering the whole backup path is clearly inefficient. Simulations showed that the majority
of backup flows turned out to be useless as the protected flow expired within seconds.
Predicting the next failure point is obviously not possible, though, a failure probability map could be derived from the work of Gill et al. [36] to improve backup
rules placement. In absence of such a statistical model, all nodes are considered as
potential failure points, which forces the controller to manage fail-over rules for each
packet-in received. In order to limit the amount of traffic, the controller will only
signal backup flows to the corresponding PLR. This choice of implementation is later
discussed in this section.
5.2.4
Results
Following the same methodology as described earlier in this chapter, traffic measurements during node failures are yet again focused on packet-in and flow-mods
messages. Figures 5.6(a) an 5.6(b) combines the aggregated traffic measurements
33
when one-to-one is enabled. From a general standpoint, it appears that this version
MPLS fast reroute has nearly no positive effects on switch-to-controller contention.
In fact, the only visible effect is a clear increase of flow-mods since the controller
has to compute and forward backup flows for each new packet-in message. This situation is not surprising as it was highly anticipated and briefly discussed at beginning
of this chapter. Regardless the fast reroute mode (i.e., one-to-one or many-to-one),
the controller will necessarily generate a higher number of flow-mods messages.
Though, the intended benefit when implementing fast reroute is to reduce the traffic
overhead at the control plane level, preventing a globalized network failure.
0.8
0.8
0.6
0.6
cdf
1.0
cdf
1.0
0.4
0.2
0.060
0.4
f=1
f=5
f=10
65 70 75 80 85 90 95
#Packet-in messages (x 103)
0.2
0.0
120
100
(a) Cdf of Packet-in messages with an increasing number of failures
f=1
f=5
f=10
125
130
135
140
#Flow-mods messages (x 103)
145
(b) Cdf of Flow-mods messages with an increasing number of failures
Figure 5.6: Cdf of ouput (flow-mods) and input (packet-in) messages recorded at
the control plane. It appears that one-to-one has a limited impact on input messages
while it heavily affects the output traffic rate
Moving forward now with Figure 5.7 that expresses the proportion of backup
flows usage in each failure scenario. This figure validates the previous concerns as it
reveals that only a small fragment of backup flows is effectively inquired. The right
Y-axis denotes the percentage of backup flow rules that has been used during the
simulation. The left Y-axis denotes the total number of backup flow installed. The
usage ratio increases with the number of failures but the latter varies between 10%
and 20%. Overall, the various measurements suggest that MPLS one-to-one has only
a moderate impact on packet-in as opposed to its flow-mods footprint.
34
160
1.0
Backup flows
f=10
f=5
f=1
0.8
Backup flows usage
Backup flows
140
120
100
0.6
80
0.4
60
40
0.2
20
0 110
115
120
Switch node
125
0.0
Figure 5.7: Backup flow installation (left Y-axis) and usage (right Y-axis). The
#matches
). Switches that are
hatched bars refer to backup flow usage ratio (i.e., #backup−f
lows
not affected by failures are ignored since they all have zero usage rates. We can see
that the majority of backup flows are never used during the simulations
Also, as shown by Figure 5.8, implementing fast reroute does not prevent the
increase of ofp-pet-flow-mod-failed messages when a larger number of nodes collapsed. As in the previous simulation model, OF switches remain indefinitely inactive
after a failure, which ultimately narrows down the number of available routes. Several
portion of the topology are thus subject to a higher peak of traffic aggregation. It is
interesting to note that, compared to the previous measurements; errors messages are
less prevalent as routing decisions are now exclusively based on the extended Dijkstra
shortest path algorithm. This also explains the decrease of packet-in and flowmods from the previous measurements in Section 5 where OF switches employed
MAC learning protocol after a failure occurred. ARP resolutions are performed as
usual when the destination MAC address is unknown.
1.0
0.8
cdf
0.6
0.4
f=1
f=5
f=10
0.2
0.00
10
20
30
40
# Errors messages
50
60
Figure 5.8: Cdf of errors messages (due to a lack of space in switches flow table). A
more adequate traffic distribution (obtained with the extended Dijkstra algorithm)
has helped reducing errors messages.
35
5.2.5
Discussion
In all fairness, this one-to-one, while being exceptionally simple to implement, has
very few encouraging results. During each simulation, a large proportion of fail-over
rules were wasted as their related flows expired. The effective usage of backup flows
will mostly depend on flow duration and traffic intensity. A flow that lasts longer
has higher chances of being interrupted, and thus re-routed, increasing the backup
flows usage rate. In this topology model, the vast majority of flows duration is under
the 2-second mark. A second point to consider is traffic intensity. Backup flows are
installed on edge switches given the topology shape, and the traffic at this topology
level is generally lower.
In order to reduce switch-to-controller overhead, the backup flow generation process should not interfere with “regular” flow generation. In this case, backup flows are
generated and signaled immediately after a packet-in was received, as the internal
controller is not restrained by “physical” limitations. In a real-word situation, the
controller is expected to wisely schedule backup flows signaling process to minimize
its CPU and bandwidth consumption.
Another drawback of this fail-over mechanism is its landmark on memory consumption. The absence of backup flow placement policy implies that for each new
flow, a backup flow will have to be installed. In practice, this will double the space
requirements for any given flow table, which will in turn increase flow look-up delays.
Clearly, this model is not applicable in a real-life network where the traffic intensity
could be rated in millions per second [20].
The second version of MPLS fast reroute is expected to correct various issues
encountered with the one-to-one mechanism. Its ability to reroute packets in bulk
with a single backup rule will certainly optimize resources consumption at different
levels.
36
5.3
Many-to-one
While it appears more efficient this second version of MPLS fast reroute requires
additional changes in Openflow specification. In the previous fail-over mechanism,
backup flow rules were simply pushed into switches flow tables. A lower flow priority
guaranteed that backup rules would not overshadow regular flows. In this case, the
same trick could not be employed since a single backup rule can match several flows.
This means that a given switch could potentially “reroute” new flows without notifying the controller of its existence. Thus, a second flow table (fast reroute (FRR)
table) has been created and will be dedicated to backup flows. As for the previous
fail-over mechanism, this section begins by presenting the implementation design and
concludes with a discussion on the results.
5.3.1
FRR table
This auxiliary flow table represents the major change in Openflow specification v0.8.9
as it also requires to extend the current action commands. Formally, flow-mods
messages possess a command field (unsigned integer) that specifies the action to be
executed. The main action commands are:
• OFPFC ADD: Insert in the flow table
• OFPFC MODIFY : Modify all matching flows
• OFPFC DELETE : Delete all matching flows
Obviously, these action commands are directly related to the regular flow table.
The logical solution is to extend the existing set with the following flags:
• OFPFC ADD FRR: Insert in the FRR table
• OFPFC MODIFY FRR: Modify all matching flows in FRR table
• OFPFC DELETE FRR: Delete all matching flows in FRR table
Note: Openflow specification mentions the possibility to request flow table statistics with an ofpst-table request. This feature could also be extended to take into
account FRR table stats but this was considered of lower importance. Instead,
OpenflowStats (created for this purpose) remains the preferred alternative.
37
5.3.2
Backup path generation
Many-to-one mode is achieved by exploiting wildcard fields. In practice, the backup
flow generation process is roughly identical to one-to-one fast reroute. The controller
computes the backup flow and generates a flow-mods message if the route is valid.
The difference is that every single match field except the destination address are
wilcarded.
This is done by using the following instruction:
ofm.match.wildcards = htonl(OFPFW_ALL & (∼ OFPFW_DL_DST))
where htonl() (host to network long) is a function that converts host numbers to
network byte ordering (i.e., Big Endian). Once the action command has been set to
OFPFC ADD FRR, the controller will then proceed by sending the message to the
corresponding switch. Note that priority fields are completely ignored in this context
since backup rules are now placed in a distinct flow table. Also, backup flows are now
set to last longer since they can potentially match a higher number flows.
5.3.3
Backup path selection
Even though switches have two flow tables, the same principle is applied when a
failure occurs. First, the outdated rule is removed from the regular flow table, then,
the switch iterates through its FRR table to determine if a backup path is available.
Similarly to the previous fast reroute mode, a packet-in is generated when no backup
rule can be matched.
5.3.4
Results
The immediate effect of many-to-one on switch-to-controller traffic is a net decrease
of flow-mods messages as depicted in Figure 5.9(b). In general, simulation measurements report a drop of 20,000 flow-mods compared to one-to-one. In this regard,
the specification changes caused by this fail-over mechanism appears as a bargain.
Clearly, this drop in flow-mods messages is related to the backup path generation
process. The controller initiates only one backup path per destination as opposed to
the previous situation where nearly every packet-in was followed by a backup flow.
38
0.8
0.8
0.6
0.6
cdf
1.0
cdf
1.0
0.4
0.4
f=1
f=5
f=10
0.2
0.060
65 70 75 80 85 90 95
#Packet-in messages (x 103)
f=1
f=5
f=10
0.2
0.0
100
100
(a) Cdf of Packet-in messages with an increasing number of failures
105
110
115
#Flow-mods messages (x 103)
120
(b) Cdf of Flow-mods messages with an increasing number of failures
Figure 5.9: Cdf of ouput (flow-mods) and input (packet-in) messages recorded at
the control plane. Compared to the previous mode, a net decrease of output messages
is observed while input messages are roughly identical
Sadly, while many-to-one has an improved flow-mods footprint, its impact on
packet-in remains marginal as shown by Figure 5.9(a). At first glance, it appears
that both mechanisms have the same packet-in footprint. In fact, many-to-one
manages to slightly decrease the overall packet-in generation rate, though, this
could hardly be considered as an improvement. This indicates that in the current
simulation model, both mechanisms are unable to effectively suppress the excess of
packet-in during network failures.
Figure 5.10 merges the previous observations with backup flow usage. First, the
total number of backup flows is naturally lower than in the previous model with a
drop of 26 % in general. Second, backup flow usage now varies between 20 % and 35
%, which indicates a more adequate usage of available resources.
160
1.0
0.8
Backup flows usage
Backup flows
140
120
100
0.6
80
Backup flows
f=10
f=5
f=1
0.4
60
40
0.2
20
0 110
115
120
Switch node
125
0.0
Figure 5.10: Backup flow installation (left Y-axis) and usage (right Y-axis). The
#matches
hatched bars refer to backup flow ratio (i.e., #backup−f
). Many-to-one improves
lows
the usage rate while decreasing backup flows requirements
39
The last visible effect of many-to-one is the fall of ofp-pet-flow-mod-failed
messages as showed in Figure 5.11. This is a direct consequence of the implementation
design of many-to-one. Since the backup flows are sent to a distinct flow table,
switches are now able to insert a higher number of “regular” flows, thus, decreasing
the risk of table overflow.
1.0
0.8
cdf
0.6
0.4
f=1
f=5
f=10
0.2
0.00
5
10
15
# Errors messages
20
Figure 5.11: Cdf of errors messages (due to a lack of space in switches flow table).
The FRR table relieves the flow table from backup rules, which reduces the risk of
overflow
5.3.5
Discussion
From a general perspective, compared to one-to-one, this second mechanism has
definitely a lighter footprint in terms of memory and bandwidth consumption. Its
ability to re-route flows in bulk has improved the backup flow usage on both ends.
First, by reducing the required number of backup flows and then, by increasing the
usage rate with its wildcard-based rules. It is important to note that the latter are
maintained for a longer time period in the FRR table while one-to-one backup rules
were set to expire with their corresponding flow. This imply that the FRR table
space complexity is necessarily higher than one-to-one. Clearly, the controller should
adequately configure backup flow timeouts (idle and hard ) to avoid any risk of table
overflow. Empirical measurements suggested a five seconds delay for both timeouts.
Applying the same parameters outside of this simulation environment would definitely
produce different results.
Also, while it is true that this fail-over mode outperforms the previous one, this
improvement comes with a price. Unlike the previous mode, the implementation of
many-to-one required to extend Openflow specification and establish a distinct flow
table for backup rules. If many-to-one were to be fully integrated in Openflow protocol, other segments of the specification have to adapted (e.g., table stats request).
40
Chapter 6
Refined Topology Model
As mentioned earlier, both MPLS fast reroute variants were unable to fulfill their
intended objectives. The many-to-one version managed to lower its flow-mods footprint but barely improved packet-in traffic during network failures. Different reasons
can explain this situation. First, recall that the controller is not fully covering the
backup path to avoid excessive (and potentially useless) traffic. Second, backup flows
are exclusively installed at edge nodes given the topology shape. Higher-level nodes
were assured to stay “alive” during the whole simulation, otherwise there was a risk
that the topology might have been sliced in two isolated parts. Consequently, the
topology model has been refined to alleviate from these limitations and provide a
more suitable simulation environment.
6.1
Major upgrades
The first obstacle in building a more realistic topology is the absence of a loop prevention algorithm in ns-3. Implementing the Spanning Tree Protocol within the
remaining time frame was not feasible. Instead, a temporary fix has been found
in OpenflowSwitchNetDevice to avoid flooding ARP packets generated during each
client-server connection setup. Basically, a table will record each (MAC src, Ipv4
dst) pairs for every new ARP packet. If a duplicate is received, the packet is univocally discarded1 . This solution succeeded to avoid a network collapse during ARP
resolutions but without fully suppressing “useless” ARP packets. Although this will
necessarily have an impact on I/O traffic at the data plane, the actual amount of
excessive traffic is negligible (less than 5 ARP packets duplicates are received per
switch).
1
Note that end-hosts ARP table entries are static, which guarantees that no ARP packets will
be wrongfully dropped
41
Figure 6.1: Data center topology model. The actual topology is made of 5 core, 25
aggregate and 125 edge nodes
Figure 6.1 portrays the refined topology model2 . As we can see, end-hosts are still
attached at edge switches as in the previous model. The difference is that core and
aggregate switches have a higher degree of connectivity. Unlike the previous topology
model, node failures can now occur at aggregate as well as core level switches. Note
that end-hosts are still not able to re-direct their flows if the first hop node collapses
(edge switch). Along with this refined topology model, the controller will accordingly
apply a different backup flow placement policy. Instead of singularly covering each
PLR, the controller will now send backup flows to each OF switch on the backup
path, thus providing a full coverage (except for edge switches).
6.2
Results
Measurements start with a failure-free scenario as for the previous topology model.
Though, in the following analysis, only the many-to-one version of MPLS fast reroute
is considered. While its implementation is relatively more intrusive in terms of Openflow specification changes, it showed signs of improvements regarding I/O traffic at
the control plane. The simulation duration remains at 20 seconds.
Figure 6.2 gathers the measurements performed at the control plane for the different simulation scenarios when many-to-one is not enabled. The main observations
that can be made is a net decrease on the overall I/O traffic in each scenario compared
to the previous topology model. Likewise, flow-mods proportion remains at same
level as before (twice as many as packet-in messages) since the controller installs
a backward rule (i.e. dst to src) for each new flow. It is important to note that
given this topology model, different parameters had to be re-configured. The traffic
2
The python visualizer module was unable to draw a clean topology structure as switches
were inexplicably aggregated in groups
42
model had to be adapted to avoid overflowing switches flow tables. More specifically, this required to restrain the applications generation rate but without modifying
flows characteristics (e.g., size, longevity and delay). Also, error messages are now
completely negligible since the topology map connectivity is higher (i.e., core and
aggregate nodes can reach end-hosts using different routes).
0.8
0.8
0.6
0.6
0.4
0.2
0.040
cdf
1.0
cdf
1.0
0.4
f=0
f=5
f=10
f=15
50
60
70
80
90 100
#Packet-in messages (x 103)
0.2
0.080
110
(a) Cdf of Packet-in messages with an increasing number of failures
f=0
f=5
f=10
f=15
90 100 110 120 130 140 150 160
#Flow-mods messages (x 103)
(b) Cdf of Flow-mods messages with an increasing number of failures
Figure 6.2: Preliminary I/O traffic measurements when many-to-one is not enabled.
Compared to the previous topology model, traffic intensity has dropped from an order
of magnitude. Errors messages (not shown) are less prevalent given the higher degree
of connectivity at core and aggregate switches
With respect to the same methodology described at the beginning of this chapter,
these preliminary measurements will serve as a comparison basis for the next simulation model where many-to-one is enable. Figure 6.3 combines the most relevant
measurements (ignoring errors messages) when many-to-one is activated. Globally,
there a three major points that can be observed:
• First, it appears that flow-mods packets generated at the control plane have
slightly increased. This is a direct consequence of the backup flow placement
policy. In this current model, the controller covers the whole backup path, thus
including core and aggregate nodes. This will naturally have an effect on I/O
traffic.
43
• Then, despite being populated to a higher number of nodes, backup flow usage
is maintained at a decent level. This implies that the aforementioned rule placement policy is not over-extended. Except for a few isolated cases, the overall
usage varies between 18 and 27%. Core nodes have naturally a higher usage
since they are less prevalent and are positioned at a higher level in topology
map.
• Finally, with this new topology model, many-to-one has yet again reduced the
number packet-in messages during network failures. In fact, this improvement
is not clearly visible since in the previous topology model, when a flow was rerouted from an edge switch then sent to a higher node level, no packet-in was
generated. Indeed, there was exactly one route that connected a particular
(core, aggregate) nodes pair. In this new model, core and aggregate switches
connectivity has increased, but since they are now included in the backup path,
they will also remain “silent” when re-routing a flow.
0.6
0.6
0.4
0.2
f=0
f=5
f=10
f=15
0.040 50 60 70 80 90 100 110 120
#Packet-in messages (x 103)
cdf
0.8
cdf
0.8
f=0
f=5
f=10
f=15
0.4
0.2
0.080 100 120 140 160 180 200
#Flow-mods messages (x 103)
(a) Cdf of Packet-in messages (b) Cdf of Flow-mods messages
with an increasing number of with an increasing number of
failures
failures
160
140
120
100
80
60
40
20
00
1.0
Backup flows
Core nodes
Aggregate nodes
Backup flows usage
1.0
Backup flows
1.0
0.8
0.6
0.4
0.2
5
10
15
Switch node
20
250.0
(c) Backup flow usage
Figure 6.3: I/O traffic measurements at the control plane and average backup flow
usage at core and aggregate nodes. Results shows a consistent improvement in terms
of packet-in prevention. Regarding backup flow usage, core switches have a higher
usage rates since they are placed at higher level in the topology
44
6.3
Discussion
The silver lining with this refined topology model is that this many-to-one version of
MPLS fast reroute managed to hold its promises. The increase of flow-mods messages is rightfully compensated by a drop of packet-in. Clearly, there is a trade-off
between the actual backup path coverage and the number packet-in generated during network failures. A full coverage of each backup path would definitely require
a larger amount of flow-mods messages, which could potentially overload the controller. The counterpart of this policy is that packet-in during re-routes will be
entirely suppressed. On the other hand, a marginal coverage of each backup path
limits the number of flow-mods but this will inevitably increase packet-in messages. Though, one can argue that unlike packet-in messages, the controller can
dictates the “appropriate” time to populate a backup rule through flow-mods packets. It is true that a full coverage requires additional resources, but the controller
is not forced to compute and signal backup rule right after a packet-in is received.
Instead, this process is expected to be wisely executed in order to avoid any risk of
overload. In this situation, the controller is not subject to any limitations since it
remains “invisible” in the simulation model.
45
Chapter 7
Conclusions
This thesis aimed at studying scalable mechanisms to reduce switch-to-controller
overhead in Openflow-based SDN networks. In this regard, the study was focused
on MPLS fast reroute and its two variants; one-to-one and many-to-one. Although
developed for a different purpose, these mechanisms can potentially relieve I/O traffic
at the control plane in advent of a failure.
Naturally, other solutions were explored in this domain. Section 2.2 of chapter 2
reviewed the main approaches that can effectively improve current SDN architectures
in terms of scalability, resiliency and overall efficiency. Two global trends are opposed:
distributed and centralized systems. The first one implies that the control plane
is scattered on multiple nodes, each with their own network view. In the second
approach, the control plane can also be scattered on multiple nodes but the latter will
appear as a single entity. Distributed systems (and centralized if the control plane
is distributed) require East/West API to facilitate inter-controller communication,
which can bring another layer of complexity in SDN standardization process.
Other alternatives such as Kandoo [33] and Devoflow [20] have adopted a different
approach. Kandoo advocates an optimization-driven design with parallel computing
that scales with the number of cores. Devoflow shifts the control logic to the data
plane for short-lived flows with the intent to reduce switch-to-controller traffic. Both
alternatives have limitations in terms of applicability (e.g., financial cost and complexity for Kandoo and ambiguous control layer abstraction for Devoflow). MPLS
fast reroute is also orthogonal to the control plane architecture design (centralized
and distributed) but unlike Devoflow, the controller remains fully in charge of routing
decisions regardless the flow longevity.
46
7.1
7.1.1
Main Lessons
One-to-one
In this mode, every packet-in (ignoring ARP packets) was backed with a fail-over
rule. Implementing One-to-one was brilliantly simple as it required no additional
changes in Openflow specification. However, this factor cannot fully compensate its
poor contribution. Results showed that one-to-one is unable to suppress the excess of
traffic during network failures. The controller wasted a relatively important amount
of resources to compute and populate backup flows that were ultimately left unused.
In the mean time, this would also increase the number of errors messages related
to switches table overflows. Though, the extended Dijkstra shortest path algorithm
covered a large portion of these errors, as routing decisions were also based on the
network traffic state.
7.1.2
Many-to-one
This second variant of MPLS fast reroute outperformed one-to-one in terms of resource consumption. First by reducing the output traffic (flow-mods) and then by
decreasing backup flows requirements. Also, since wildcard-based rules are expected
to match a higher number of flows, they were set to last longer (higher idle and hard
timeouts in switches FRR table. The same improvement could not be observed for
input traffic. Recall that the fail-over policy in both one-to-one and many-to-one was
to cover only the PLR, which explains why the controller is receiving packet-in for
re-routed flows. This policy was considered adequate in an effort to reduce “useless”
backup flow signaling. In the refined topology model, this policy was modified in
order to cover a longer chunk of backup paths. In practice, this was achieved by populating fail-over rules to cores and aggregate switches. Results showed encouraging
improvements in terms of packet-in prevention during failures.
7.1.3
Applicability
The intent with the simulation model developed in this thesis was to mimic data
centers network traffic. The measurements and discussions are therefore limited in
scope with respect to the simulation environment. It is true that SDN deployment
is not restrained to dense networks. However, as explained earlier, modeling another
type of network such as small LANs (dozens of nodes) would have been of poor
47
interest. A simple controller such as NOX [27] is able to cope with a 30,000 flows/s
rate, which should be far beyond the requirements of small and moderate networks.
7.2
Limitations and Future Work
During the course of thesis project, different obstacles had to be overcome in order
to obtain a valid Openflow network model. Of course, regardless its sophistication
level, the same model is still an abstraction of reality and as such, remains subject
to limitations. Nevertheless, this section highlights the most critical aspects of the
simulation model that would be further refined in a follow-up research.
7.2.1
The controller
The controller was undeniably at the core of this study. Measurements were primarily
focused on I/O traffic at the control plane. Sadly, the Openflow module developed
by Blake Hurd was limited with an internal controller reduced to its simple form of
expression (a MAC-learning and a drop controller). Even if it was later extended
with other features during this project, it remains “invisible” in the topology model.
In this context, the traffic rate at the control plane had to be inferred based on
OF packet size (including the missing IP/Ethernet headers). This particularity of
the controller also implies that the latter cannot be subject to congestion or even
drop packets along the way, which could hints flaws in the implementation design.
This model gap could be fixed with EmuNetDevice, which stands for “Emulate Network Device”. Through EmuNetDevice, a virtual node in the simulation environment
(e.g., OF switch) can send packets to a real network device. An existing OpenFlowcompliant controller will have to be installed on a real hardware with its specific port
interface. The EmuNetDevice module in ns-3 requires this interface to be configured
in promiscuous mode (i.e., all traffic is sent to the CPU) to avoid host machine and
ns-3 IP stacks interference. Also, simulations will have to be run in real time to allow
communications with the “real” controller.
7.2.2
Topology model
The topology plays a key role in the simulation model. In this project, a large network
(hundreds of switches and end-hosts) was modeled. End-hosts were attached at the
edge of the network as in data centers topology. Ideally, the topology should be based
on real-life networks layouts. This would strengthen the simulation model and allow
48
meaningful analysis on traffic measurements. Without a network loop prevention
algorithm such as provided by the Spanning Tree protocol, the layout design were
limited.
A temporary hack was found in OpenFlowSwitchNetDevice by simply ignoring
consecutive ARP packets for the destination IP address. It should be noted that
broadcast applications were not taken into account in the traffic model since they
could also collapse the entire network. A permanent solution would be to implement
the Spanning Tree protocol. This could potentially result in a painful debugging
process as it requires to modify core ns-3 modules.
7.2.3
Traffic model
The traffic and topology models are complementary. Thus, the traffic characteristics
should match real-life proprieties of data centers in terms of flow size, duration and
generation delay. With their comprehensive analysis, Benson et al. [47] provided
the building blocks of the traffic model. Though, network applications were limited
to UDP client-server types as TCP applications give rise to micro flows during the
synchronization setup (SYN and SYN ACK ). Obviously, installing backup flows for
these micro flows with MPLS fast reroute would have be impractical. Also, broadcast
applications were forbidden since they could potentially disrupt the entire network
communication flow.
7.2.4
Failure model
This final point is equally as essential as the previous ones. For each failure scenario,
switches were set to collapse at a random time without relying on any particular
statistical model. A refined version of the network failure model can be obtained
by leveraging the work of Gill et al. [36]. Core, aggregate and edge switches would
follow different failures policies based on ground-truth statistics. This could also serve
the backup flow placement policy since the controller will be able to determine the
probability that a given switch will collapse.
Finally, it would interesting to study the use of both variants of MPLS fast reroute
when the network traffic is saturated. It is believed that the many-to-one version can
be enabled in order to maintain flows QoS. If the results are conclusive, this will
radically increase the contribution of this fail-over mechanism.
49
Appendix A
UML Class Diagram
Figure A.1: High level UML class diagram
Figure A.1 gives the most essential classes that are used during the simulations.
The reported class methods and attributes has been reduced to the strict minimum to ensure readability. OpenFlowSwitchNetDevice and LearningController
are part of the Openflow module. The first one models a forwarding device. The
LearningController class object models a simple SDN controller that uses MAC
learning protocol during flow setups. During this thesis project, these two classes
have been extended to provide MPLS fast reroute capabilities.
50
The FRRController updates the NetworkStateView whenever a change in the topology is detected. In practice, when a forwarding device collapses, the controller is
expected to receive port-status messages. From there, the controller notifies the
event to its NetworkStateView. Additionally, packet-in and flow-expired messages are signaled to the NetworkStateView to update the node weight (number of
active flows) associated to each forwarding device.
The EventScheduler and TopologyBuilder are the other two essential classes in
the simulation model. RandomTrafficGenerator and RandomFailureGenerator are
(resp.) responsible for traffic generation and node failure management. Both classes
can be configured through their specific attributes (as recommended by ns-3 guidelines manual1 ).
The comprehensive documentation page related to each classes developed in this
thesis project can be accessed through this link:
www.tinyurl.com/docu-tfe-sdn
The complete source code can also be retrieved from this web page:
www.tinyurl.com/tfe-sdn
Backup source code link:
www.dropbox.com/sh/8r9o8bndgbnfs5i/AAD4ItPcSG9rLneUT8kd1aEZa?dl=0
Backup documentation link:
www.dl.dropboxusercontent.com/u/50583281/html/index.html
1
https://www.nsnam.org/docs/manual/html/attributes.html
51
Bibliography
[1] Monaco, M. and Michel, O. and Keller, E. Applying Operating System Principles
to SDN Controller Design. In Proc. ACM SIGCOMM, HotNets, November 2013.
[2] Foster, N. and Harrison, R. and Freedman, M. and Monsanto, C. and Rexford,
J. and Story, A. and Walker, D. Frenetic: A network programming language. In
Proc. ACM SIGPLAN, November 2011.
[3] Heller, B. and Seetharaman, S. and Mahadevan, P. and Yiakoumis, Y. and
Sharma, P. and Banerjee, S. and McKeown, N. Elastictree: Saving energy in
data center networks. In Proc. USENIX Conference on Networked Systems Design and Implementation, October 2010.
[4] Ghaffarinejad, A. and Syrotiuk, V.R. Load Balancing in a Campus Network
Using Software Defined Networking. In Proceedings of Research and Educational
Experiment Workshop, March 2014.
[5] Wang, R. and Butnariu, D. and Rexford, J. Openflow-based server load balancing gone wild. In Proc. USENIX Conference on Hot Topics in Management
of Internet, Cloud, and Enterprise Networks and Services, Hot-ICE, November
2011.
[6] Leiner, B. and Cerf, V. and Clark, D. and Kahn, R. and Kleinrock, L. and Lynch,
D. and Postel, J. and Roberts, L. and Wolff, S. A Brief History of the Internet.
SIGCOMM Computer Communication Review, 39(5):22–31, October 2009.
[7] Case, J. D. and Fedor, M. and Schoffstall, M. L. and Davin, J. Simple Network
Management Protocol (SNMP). RFC 1157, IETF, May 1990.
[8] Schoenwaelder, J. Overview of the 2002 iab network management workshop.
RFC 3535, IETF, May 2003.
52
[9] Handigol, N. and Heller, B. and Jeyakumar, V. and Maziéres, D. and McKeown,
N. Where is the debugger for my software-defined network?
In Proc. ACM
SIGCOMM, HotSDN, August 2012.
[10] Moy, J. OSPF Version 2. RFC 2328, IETF, June 1998.
[11] Dijkstra, E.W. A note on two problems in connexion with graphs. Numerische
Mathematik, 1:269–271, January 1959.
[12] Hedrick, C. L. Routing Information Protocol. RFC 1058, IETF, April 1988.
[13] Carpenter, B. and Brim, S. Middleboxes: Taxonomy and issues. RFC 3234,
IETF, February 2002.
[14] Botero, J.F. and Hesselbach, X. The bottlenecked virtual network problem in
bandwidth allocation for network virtualization. In Proc. IEEE LATINCOM,
September 2009.
[15] Diot, C. and Levine, B. N. and Lyles, B. and Kassem, H. and Balensiefen,
D. Deployment issues for the ip multicast service and architecture. Network
Magazine of Global Internet Networking, 14(1), January 2000.
[16] McKeown, N. and Anderson, T. and Balakrishnan, H. and Parulkar, G. and
Peterson, L. and Rexford, J. and Shenker, S. and Turner, J. OpenFlow: Enabling
Innovation in Campus Networks. SIGCOMM Computer Communication Review,
38(2), March 2008.
[17] Tate, J. and Easterly, M. IBM System Networking RackSwitch G8264: User
guide. IBM, February 2011.
[18] Brocade Communications Systems. Brocade MLX Series Architecture. Technical
Report GA-WP-1370-05, March 2013.
[19] Hewlett Packard. HP 8200 ZL Switch Series. Technical Report c04111638-DA12862, December 2014.
[20] Curtis, A. and Mogul, J-C. and Tourrilhes, J. and Yalagandula, P. and Sharma,
P. and Banerjee, S. DevoFlow: Scaling Flow Management for High-performance
Networks. In Proc. ACM SIGCOMM, August 2011.
[21] Feamster, N. and Rexford, J. and Zegura, E. The road to sdn. Queue, 11(12),
December 2013.
53
[22] Doria, A. and Hadi Salim, J. and Haas, R. and Khosravi, H. and Wang, W.
and Dong, L. Forwarding and Control Element Separation (ForCES) Protocol
Specification. RFC 6O41, IETF, October 2010.
[23] Macedonia, M.R. and Brutzman, D.P. Mbone provides audio and video across
the internet. Computer, 27(4):30–36, April 1994.
[24] Fink, R. and Hinden, R. 6Bone (IPv6 Testing Address Allocation) Phaseout.
RFC 2772, IETF, February 2000.
[25] Koponen, T. and Casado, M. and Gude, N. and Stribling, J. and Poutievski,
L. and Zhu, M. and Ramanathan, R. and Iwata, Y. and Inoue, H. and Hama,
T. and Shenker, S. Onix: A distributed control platform for large-scale production networks. In Proc. USENIX Conference on Operating Systems Design and
Implementation, OSDI, October 2010.
[26] Handigol, H. and Seetharaman, S. and Flajslik, M. and Gember, G. and Mckeown, N. and Parulkar, G. and Akella, A. and Feamster, N. and Clark, R. and
Krishnamurthy, A. and Brajkovic, V. and Anderson, T. Aster*x: Load-Balancing
Web Traffic over Wide-Area Networks. In Proceedings of GENI Engineering Conference, September 2010.
[27] Tavakoli, A. and Casado, M. and Koponen, T. and Shenker, S. Applying NOX
to the Datacenter. In Proc. ACM SIGCOMM, HotNets, October 2009.
[28] Zhang, Y. and Natarajan, S. and Huang, X. and Beheshti, N. and Manghirmalani, R. A compressive method for maintaining forwarding states in sdn
controller, August 2014.
[29] Heller, B. and Sherwood, R. and McKeown, N. The controller placement problem. In Proc. ACM SIGCOMM, HotSDN, August 2012.
[30] Erickson, D.
The beacon openflow controller.
In Proc. ACM SIGCOMM,
HotSDN, August 2013.
[31] Heller B. OpenFlow Switch Specification Version 0.8.9. Technical report, Open
Networking Foundation (ONF), December 2008.
[32] Casado, M. and Freedman, M. J. and Pettit, J. and Luo, J. and McKeown, N.
and Shenker, S. Ethane: Taking control of the enterprise. SIGCOMM Computer
Communications Review, 37(4):1–12, August 2007.
54
[33] Hassas Yeganeh, S. and Ganjali, Y. Kandoo: A framework for efficient and
scalable offloading of control applications. In Proceedings of the First Workshop
on Hot Topics in Software Defined Networks, HotSDN, August 2012.
[34] Cai, Z. Maestro: Achieving Scalability and Coordination in Centralized Network
Control Plane. PhD thesis, Rice University, Houston, TX, USA, October 2012.
[35] Rosen, E. and Viswanathan, A. and Callon, R. Multiprotocol label Switching
Architecture. RFC 3031, IETF, January 2001.
[36] Gill, P. and Jain, N. and Nagappan, N. Understanding Network Failures in Data
Centers: Measurement, Analysis, and Implications. In Proc. ACM SIGCOMM,
August 2011.
[37] Bonica, R. and Gan, D. and Pignataro, C. ICMP extensions for multiprotocol
label switching. RFC 4950, Internet Engineering Task Force, August 2007.
[38] Donnet, B. and Luckie, M. and Mérindol, P. and Pansiot, J-J. Revealing MPLS
Tunnels Obscured from Traceroute. in ACM SIGCOMM Computer Communication Reviews, 42(2):87–93, March 2012.
[39] Anderrson, L. and Minei, L. and Thomas, B. Label Distribution Protocol Specification. RFC 5036, Internet Engineering Task Force, October 2007.
[40] Pan, P. and Swallow, G. and Atlas, A. Fast Reroute Extensions to RSVP-TE
for LSP Tunnels. RFC 4090, IETF, May 2005.
[41] Awduche, D. and Berger, L. and Gan, D. and Li, T. and Srinivasan, V. and
Swallow, G. RSVP-TE: Extensions to RSVP for LSP Tunnels. RFC 3209,
IETF, December 2001.
[42] Riley, G. and Henderson, T. The ns-3 Network Simulator. In K. Wehrle, M. Gne,
and J. Gross, editors, Modeling and Tools for Network Simulation, pages 15–34.
Springer Berlin Heidelberg, June 2010.
[43] Pelkey, J. Ns-3 OpenFlow Switch Support Repository. http://code.nsnam.
org/jpelkey3/openflow, 2011.
[44] OpenFlow Switch Support Documentation.
http://www.nsnam.org/docs/
release/3.13/models/html/openflow-switch.html, 2011.
55
[45] Spring, N and Mahajan, R. and Wetherall, D. Measuring ISP Topologies with
Rocketfuel. In Proc. ACM SIGCOMM, August 2002.
[46] Mahadevan, P. and Hubble, C. and Krioukov, D. and Huffaker, B. and Vahdat, A. Orbis: Rescaling Degree Correlations to Generate Annotated Internet
Topologies. In Proc. ACM SIGCOMM, July 2007.
[47] Benson, T. and Akella, A. and Maltz, D. A. Network Traffic Characteristics of
Data Centers in The Wild. In Proc. ACM SIGCOMM, Septembre 2010.
[48] Ammar, D. and Begin, T. and Guerin-Lassous, I. A New Tool for Generating
Realistic Internet Traffic in NS-3. In Proc. International ICST Conference on
Simulation Tools and Techniques, SIMUTools, March 2011.
[49] Levin, D. and Canini, M. and Schmid, S. and Feldmann, A. Incremental SDN
Deployment in Enterprise Networks. In Proc. ACM SIGCOMM, August 2013.
[50] Yahya, W. and Basuki, A. and Jiang, J. R. The Extended Dijkstras-based
Load Balancing for OpenFlow Network. International Journal of Electrical and
Computer Engineering (IJECE), 5(2):289–296, April 2015.
56
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement