- Computers & electronics
- Networking
- Cumulus Networks
- Data Center Layer 2 Network
- User manual
- 63 Pages
Cumulus Networks Data Center Layer 2 Network Validated Design Guide
Below you will find brief information for Data Center Layer 2 Network. This guide describes the network architecture for a layer 2 data center that follows the traditional hierarchical aggregation and access, also called spine and leaf, structure where layer 2 networking is used to connect all elements.
advertisement
Assistant Bot
Need help? Our chatbot has already read the manual and is ready to assist you. Feel free to ask any questions about the device, but providing details will make the conversation more productive.
▼
Scroll to page 2
of
63
Data Center Layer 2 High Availability Validated Design Guide Deploying a Data Center to Support Layer 2 Services with Network Switches ® ® Running Cumulus Linux DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Contents Contents ........................................................................................................................................................................................... 2 Layer 2 Networks with Cumulus Linux ............................................................................................................................................ 4 Objective ....................................................................................................................................................................................... 4 Enabling Choice of Hardware in the Data Center ....................................................................................................................... 4 Driving Towards Operational Efficiencies ................................................................................................................................... 4 Intended Audience for Network Design and Build ..................................................................................................................... 5 Understanding Layer 2 Architecture ............................................................................................................................................... 6 Network Architecture and Design Considerations ..................................................................................................................... 6 Management and Out-of-Band Networking Considerations ...................................................................................................... 8 Scaling-out the Architecture ........................................................................................................................................................ 9 External Connectivity ..................................................................................................................................................................... 10 External Connectivity at Layer 2 ................................................................................................................................................ 10 External Connectivity at Layer 3 ................................................................................................................................................ 11 Implementing a Layer 2 Data Center Network with Cumulus Linux ............................................................................................ 12 Network Architecture for Hierarchical Layer 2 Leaf, Spine, and Core ..................................................................................... 12 Build Steps ................................................................................................................................................................................. 15 1. Set Up Physical Network and Basic Configuration of All Switches .................................................................................. 16 2. Configure Leaf Switches .................................................................................................................................................... 18 3. Configure Spine Switches ................................................................................................................................................. 24 4. Set Up Spine/Leaf Network Fabric ................................................................................................................................... 27 5. Configure VLANs ................................................................................................................................................................ 30 6. Connect Hosts.................................................................................................................................................................... 32 7. Connect Spine Switches to Core ....................................................................................................................................... 34 Automation Considerations ........................................................................................................................................................... 37 Automation and Converged Administration .............................................................................................................................. 37 Automated Switch Provisioning with Zero Touch Provisioning ................................................................................................ 37 Network Configuration Templates Using Mako ........................................................................................................................ 38 Automated Network Configuration Using Ansible .................................................................................................................... 39 Automated Network Configuration Using Puppet..................................................................................................................... 42 Operational and Management Considerations ............................................................................................................................ 46 Authentication and Authorization .............................................................................................................................................. 46 Security Hardening .................................................................................................................................................................... 46 Accounting and Monitoring........................................................................................................................................................ 46 Quality of Service (QoS) Considerations ................................................................................................................................... 47 2 CONTENTS Link Pause .................................................................................................................................................................................. 48 Conclusion ...................................................................................................................................................................................... 49 Summary .................................................................................................................................................................................... 49 References ................................................................................................................................................................................. 49 Appendix A: Example /etc/network/interfaces Configurations ................................................................................................... 51 leaf01 ......................................................................................................................................................................................... 51 leaf02 ......................................................................................................................................................................................... 53 spine01 ...................................................................................................................................................................................... 55 spine02 ...................................................................................................................................................................................... 57 oob-mgmt ................................................................................................................................................................................... 59 Appendix B: Network Design and Setup Checklist ....................................................................................................................... 60 Version 1.0.0 February 2, 2016 About Cumulus Networks Unleash the power of Open Networking with Cumulus Networks. Founded by veteran networking engineers from Cisco and VMware, Cumulus Networks makes the first Linux operating system for networking hardware and fills a critical gap in realizing the true promise of the software-defined data center. Just as Linux completely transformed the economics and innovation on the server side of the data center, Cumulus Linux is doing the same for the network. It is radically reducing the costs and complexities of operating modern data center networks for service providers and businesses of all sizes. Cumulus Networks has received venture funding from Andreessen Horowitz, Battery Ventures, Sequoia Capital, Peter Wagner and four of the original VMware founders. For more information visit cumulusnetworks.com or @cumulusnetworks. ©2016 Cumulus Networks. CUMULUS, the Cumulus Logo, CUMULUS NETWORKS, and the Rocket Turtle Logo (the “Marks”) are trademarks and service marks of Cumulus Networks, Inc. in the U.S. and other countries. You are not permitted to use the Marks without the prior written consent of Cumulus Networks. The registered trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide basis. All other marks are used under fair use or license from their respective owners. www.cumulusnetworks.com 3 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Layer 2 Networks with Cumulus Linux Objective This Validated Design Guide presents a design and implementation approach for deploying layer 2 networks on switches running Cumulus Linux. Enabling Choice of Hardware in the Data Center Server virtualization revolutionized the data center by giving IT the freedom and choice of industry-standard server hardware, thereby lowering CapEx costs, as well as a rich ecosystem of automation tools and services to provision and manage compute and storage infrastructure to achieve lower OpEx costs. This same benefit of choice is now available for networking in the data center. With Cumulus Linux, network administrators have a multi-platform, multi-vendor network OS that provides the freedom of choice with network switch hardware. Because Cumulus Linux is Linux, data center administrators can tap the already established wealth of Linux knowledge and vast application ecosystem for administering servers and extend them to network switches for converged administration. Cumulus Linux can help you achieve the same CapEx and OpEx efficiencies for your networks by enabling an open market approach for switching platforms, and by offering a radically simple lifecycle management framework built on the industry’s best open source tools. By using bare metal servers and network switches, you can achieve cost savings that would have been impossible just a few years ago Driving Towards Operational Efficiencies With a disaggregated hardware and software model, Cumulus Networks has created a simple approach for managing complex layer 2 networks. Cumulus Linux simplifies network management through hardware agnosticism and using standard Linux automation and orchestration tools. Cumulus Linux is a full-featured network operating system and presents a standard interface, regardless of the underlying hardware or features enabled. Automation tools can process templates that are written once and reused across the entire environment for provisioning and configuration management, substantially decreasing the number of upper level management systems and operating expenses as well as increasing the speed at which new networks are deployed. You can change key variables as needed in a template and power on an entirely new network in minutes without interacting with a CLI. You can leverage open source protocols and tools such as DHCP, Open Network Install Environment (ONIE), zero touch provisioning, and automation and orchestration tools of DevOps’ choice such as Ansible, Chef, Puppet, Salt, and CFEngine. These same tools are already used by many organizations to simplify server deployments, and modifying them to provision entire racks network switches becomes a simple task of converged administration. 4 LAYER 2 NETWORKS WIT H CUMULUS LINUX Intended Audience for Network Design and Build The intended audience for this guide is a data center cloud architect or administrator who is experienced with server technologies and familiar with layer 2 networking, including interfaces, link aggregation (LAG) or bonds, and VLANs. The network architecture and build steps provided in this document can be used as a reference for planning and implementing layer 2 with Cumulus Linux in your environment. A basic understanding of Linux commands is assumed, such as accessing a Linux installation, navigating the file system, and editing files. If you need to learn more about Linux, we suggest you review our Introduction to Linux videos at cumulusnetworks.com/technical-videos/ for a high level overview of Linux. If you are using this guide to help you set up your Cumulus Linux environment, we assume you have Cumulus Linux installed and licensed on switches from the Cumulus Linux Hardware Compatibility List (HCL) at cumulusnetworks.com/hcl. Additional information on Cumulus Linux software, licensing, and supported hardware may be found on cumulusnetworks.com or by contacting [email protected]. www.cumulusnetworks.com 5 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Understanding Layer 2 Architecture Network Architecture and Design Considerations Many applications in the data center require layer 2 adjacency, where the application assumes other components or services are on the same IP subnet. In this environment it is assumed that a gateway is only needed to route traffic between domains with different IP subnets. An example would be a virtual environment in which you assume that all of the VMs can talk to each other without needing a router. Figure 2 shows the network design of a typical enterprise data center pod running a virtual environment or container environment. The pod consists of a pair of aggregation/spine switches connected with one or more pairs of access/leaf switches, which in turn provide dual-homed highly available connectivity to hosts and storage elements. Figure 2. Traditional Layer 2 Hierarchical Enterprise Data Center Network Pod This document describes the network architecture for a layer 2 data center that follows the traditional hierarchical aggregation and access, also called spine and leaf, structure where layer 2 networking is used to connect all elements. For optimal network performance, hosts are connected via dual 10G links to the access/leaf switch layer, which in turn is connected via 40G links to the aggregation/spine layer. Representative spine and leaf switches running Cumulus Linux are show below in Figure 3, although the details of your specific models may vary. Spine switches typically have thirty-two 40G switch ports, whereas leaf switches have forty-eight 10G access ports and up to six 40G uplinks. In Cumulus Linux, switch ports are labeled swp1, swp2, swp3, and so forth. 6 UNDERSTANDING LAYER 2 ARCHITECTURE Figure 3. Switch Port Numbering for Aggregation/Spine and Access/Leaf Switches The network design employs multi-chassis link aggregation (MLAG) for network path redundancy and link aggregation for network traffic optimization. The use of MLAG allows for active/active ports across physical switches and avoids the underutilization of network ports from ports that are deliberately placed in a blocking state to prevent loops by Spanning Tree Protocol (STP). MLAG is deployed using the following steps: Select a pair of physical switches, either a pair of leaf switches or a pair of spine switches. There can only be two switches in a single MLAG "logical switch” configuration, although you can have multiple MLAG pairs in a topology. Establish a peer link between pair members. It is recommended to set up the peer link as an LACP bond for increased reliability and bandwidth. A clagd daemon runs on each switch in the MLAG pair and communicates over a subinterface of the peer link. Figure 4. Switch-to-Switch MLAG The peer link between the switches in an MLAG pair should be sized accordingly to ensure there is sufficient bandwidth to handle additional traffic during uplink failure scenarios. Typical deployments utilize a bond for the peer link to satisfy both bandwidth and redundancy requirements. www.cumulusnetworks.com 7 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Figure 5. Host Link Redundancy and Aggregation, LACP Figure 6. Host Network Segmentation Figure 6 shows the high availability (HA) component of the proposed layer 2 solution. As individual links are added or removed from the LACP uplink group, the traffic from applications such as databases or Web servers proceeds to flow uninterrupted to and from the host. The number of links present in the LACP uplink group directly affects the amount of bandwidth available to the applications but the loss of total available bandwidth is the only impact of a link failure on a host with high availability. Management and Out-of-Band Networking Considerations An important supplement to the high capacity production data network is the management network used to administer infrastructure elements, such as network switches, physical servers, and storage systems. The architecture of these networks varies considerably based on their intended use, the elements themselves, and access isolation requirements. This solution guide assumes that a single layer 2 domain is used to administer the network switches and storage elements. These operations include imaging the elements, configuring them, and monitoring the running system. Some installations will also use this network for IPMI, also known as DRAC or iLO, access to the host. This network is expected to host both DHCP and HTTP servers, such as isc-dhcp and apache2, as well as provide DNS reverse and forward resolution. In general, these networks provide some means to connect to the corporate network, typically a connection through a router or jump host. Figure 7 below shows the logical and, where possible, the physical connections of each element as well as the services required to realize this deployment. 8 UNDERSTANDING LAYER 2 ARCHITECTURE Figure 7. Out-of-Band Management Scaling-out the Architecture Scaling out the architecture involves adding more hosts to the access switch pairs, and then adding more access switches in pairs as needed, as shown in Figure 8. Figure 8. Adding Additional Switches Once the limit for the spine switch pair approaches, an additional network pod of spine/leaf switches may be added, as shown in Figure 9. The only constraint is that in a Layer 2-only environment, additional spine switches should be added in pairs. Figure 9. Adding Network Pods/Server Clusters www.cumulusnetworks.com 9 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE External Connectivity The spine switches can connect to core switches in one of two ways, depending on where they sit relative to the layer 2 and layer 3 boundary: External connectivity at layer 2, with routing and gateway services provided outside the cluster External connectivity at layer 3, with routing services provided by the spine switches External Connectivity at Layer 2 In this scenario, the spine switches connect to core switches that handle gateway services as shown in Figure 10. Figure 10. External Connectivity at Layer 2 When connecting to core switches at layer 2, the core switches need to support a vendor-specific form of multi-chassis link aggregation (such as vPC, MC-LAG, or MLAG) with which the spine switches can pair. If the core switches are not capable of MLAG, then you may need to consider that with the VLAN-aware bridge introduced in Cumulus Linux 2.5, a single instance of spanning tree runs on each switch. By default, BPDUs are sent on the native VLAN 1 when spanning tree is enabled on the VLAN-aware bridge. The native VLAN should match on the core switches to ensure spanning tree compatibility. To adjust the untagged or native VLAN for an interface, refer to the Configure VLANs step in the Build Steps later in this document. In addition, all VLANs that are to be trunked to the core should be allowed on all the uplink trunk interfaces to avoid potentially blocked VLANs. 10 EXTERNAL CONNECTIVITY External Connectivity at Layer 3 In this scenario, the spine switches connect at layer 3 as shown in Figure 11. Alternatively, the spine switches can be dual connected to each core switch at layer 3 (not shown in Figure 11). Figure 11. External Connectivity at Layer 3 In this design, the spine switches route traffic. To connect to the core switches, you will need to determine whether the routing is static or dynamic, and the protocol — OSPF or BGP — if dynamic. In addition, you will need to provide a gateway and routing between all layer 2 subnets on the spine switches. Additional network design considerations for this scenario include setting up Virtual Router Redundancy (VRR) between the spine switch pair to support active/active forwarding. For more information about VRR, read the Cumulus Linux documentation. www.cumulusnetworks.com 11 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Implementing a Layer 2 Data Center Network with Cumulus Linux Network Architecture for Hierarchical Layer 2 Leaf, Spine, and Core The instructions that follow detail the steps needed for building a representative network for providing layer 2 connectivity between servers using switches running Cumulus Linux. Actual configurations reference the following network topology. Figure 12. Network Topology The details for the switches, hosts, and logical interfaces are presented in the tables on the following pages. The following steps are assembled sequentially such that each step builds on the previous step and only includes the portions of the configuration that are relevant for the given step. If at any point it is unclear what configuration should be present, Appendix A includes the complete configurations for all the leaves and spines; you can copy these configurations directly to a test environment. The build steps demonstrate why each piece of configuration is necessary. 12 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X leaf01 connected to leaf02 Logical Interface peerlink Description Physical Interfaces Peer bond utilized for MLAG traffic swp47, swp48 (swp45, swp46 reserved for expansion) leaf02 peerlink.4094 Subinterface used for clagd communication N/A spine01, spine02 uplink1 For MLAG between spine01 and spine02 swp49, swp50 future use uplink2 Reserved for additional connections to spines swp51, swp52 multiple hosts access ports Connect to hosts and storage swp1 through swp44 host-01 host-01 Bond to host-01 for host-to-switch MLAG swp1 out-of-band management N/A Out-of-band management interface eth0 Description Physical Interfaces Peer bond utilized for MLAG traffic swp47, swp48 leaf02 connected to leaf01 Logical Interface peerlink (swp45, swp46 reserved for expansion) leaf01 peerlink.4094 Subinterface used for clagd communication N/A spine01, spine02 uplink For MLAG between spine01 and spine02 swp49, swp50 future use uplink2 Reserved for additional connections to spines swp51, swp52 multiple hosts access ports Connect to hosts and storage swp1 through swp44 host-01 host-01 Bond to host-01 for host-to-switch MLAG swp1 out-of-band management N/A Out-of-band management interface eth0 Description Physical Interfaces leaf0N connected to Logical Interface www.cumulusnetworks.com 13 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE leaf0N connected to Logical Interface Description Physical Interfaces Repeat above configurations for each additional pair of leafs. spine01 connected to Logical Interface Description Physical Interfaces spine02 peerlink Peer bond utilized for MLAG traffic swp25, swp26 spine02 peerlink.4093 Subinterface used for clagd communication N/A leaf01, leaf02 leaf-pair-01 Bond to another peer link group swp1, swp2 core1, core2 core-uplink Bond to core switches swp31, swp32 out-of-band management N/A Out-of-band management interface eth0 Description Physical Interfaces spine02 connected to 14 Logical Interface spine01 peerlink Peer bond utilized for MLAG traffic swp25, swp26 spine01 peerlink.4093 Subinterface used for clagd communication N/A leaf01, leaf02 downlink1 Bond to another peer link group swp1, swp2 core1, core2 core-uplink Bond to core switches swp31, swp32 out-of-band management N/A Out-of-band management interface eth0 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X Build Steps The steps for building out a layer 2 network environment with switches running Cumulus Linux are as follows. Step Tasks Physical Network 1. Set up physical network and basic configuration of all switches. Rack and cable all network switches. Install Cumulus Linux. Verify connectivity. Configure out-of-band management. Set hostname. Configure DNS. Configure NTP. Network Topology 2. Configure leaf switches. Configure each switch in pair individually. Create peer bond between pair. Enable MLAG peering between leaf switches. 3. Configure spine switches. Configure each switch in pair individually. Create peer bond between pair. Enable MLAG peering between spine switches. 4. Set up spine/leaf network fabric. Create a switch-to-switch “uplink” bond on each leaf switch to the spine pair and verify MLAG. Create a switch-to-switch bond on the spine pair to each leaf switch pair and verify MLAG. 5. Configure VLANs Set up VLANs for traffic. 6. Connect hosts. Provision hosts OS (if not done already). Connect hosts to leaf switches. Configure hosts with LACP bond uplinks. Provision layer 2 applications. 7. Connect spine switches to core. Configure depending on layer 2 or layer 3 connectivity: For layer 2: Create an MLAG port channel to the core for each spine switch. On each core, create an MLAG or vendor-equivalent Multi-Chassis Link Aggregation to the logical spine switch pair. For layer 3: Configure layer 3 switch virtual interface (SVI) gateways. www.cumulusnetworks.com 15 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE In a greenfield environment, the order of configuring spine or leaf switches does not matter, so steps 3 and 4 above may be done in reverse order. In a brownfield environment, start with leaf switches as shown above for minimal network service disruptions. The build order is detailed out in the following steps. Reference the Network Design and Setup Checklist in the appendix of this guide for use in building out your network. 1. Set Up Physical Network and Basic Configuration of All Switches Rack and cable all network switches Install Cumulus Linux Configure out-of-band management Verify connectivity Set hostname Configure DNS Configure NTP After racking and cabling all switches, install the Cumulus Linux OS and license on each switch. Refer to the Quick Start Guide of the Cumulus Linux documentation for more information. Next, configure the out-of-band management as needed. To verify all cables are connected and functional, check the link state. To check the link state of a switch port, run the ip link show command, which displays the physical link and administrative state of the interface and basic layer 2 information. To show the layer 3 information as well, such as IPv4 or IPv6 address, use the ip addr show command. Look for the UP states in the output. This command checks link state but does not detect traffic flow. Refer to the Configuring and Managing Network Interfaces chapter of the Cumulus Linux documentation for more information. cumulus@leaf01$ sudo ip link set up swp47 cumulus@leaf01$ ip link show swp47 49: swp47: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master peerlink state UP mode DEFAULT qlen 500 link/ether 70:72:cf:9d:4e:64 brd ff:ff:ff:ff:ff:ff To help verify that cables are properly connected according to your network topology diagram, check what neighbors are observed from each switch port. For example, to see what port is connected to swp47 on leaf01, run the following command and observe the LLDP neighbor’s output to verify that swp47 on leaf02 is connected. cumulus@leaf01$ sudo lldpctl swp47 ------------------------------------------------------------------------------LLDP neighbors: ------------------------------------------------------------------------------Interface: swp47, via: LLDP, RID: 7, Time: 14 days, 20:06:51 Chassis: ChassisID: mac c4:54:44:bc:ff:f0 SysName: leaf02 SysDescr: Cumulus Linux MgmtIP: 192.168.0.91 Capability: Bridge, on Capability: Router, on Port: PortID: ifname swp47 PortDescr: swp47 ------------------------------------------------------------------------------- 16 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X Note: To configure files in Cumulus Linux, use your choice of text editor. A standard Cumulus Linux installation includes nano, vi, and zile. Additionally, you can install additional editors such as vim from the Cumulus Linux repository, repo.cumulusnetworks.com. The examples in this guide use vi as the text editor. If you are not familiar with using vi, substitute the command with nano or zile, which may have more familiar user interfaces. The default configuration for eth0, the management interface, is DHCP. To reconfigure eth0 to use a static IP address, edit the /etc/network/interfaces file by adding an IP address/mask and an optional gateway. Refer to the Quick Start Guide, Wired Ethernet Management section of the Cumulus Linux documentation for more information. For example, on the leaf and spine switches, configure them as follows: cumulus@leaf01$ sudo vi /etc/network/interfaces auto eth0 iface eth0 address 192.168.0.90/24 gateway 192.168.0.254 cumulus@leaf02$ sudo vi /etc/network/interfaces auto eth0 iface eth0 address 192.168.0.91/24 gateway 192.168.0.254 cumulus@spine01$ sudo vi /etc/network/interfaces auto eth0 iface eth0 address 192.168.0.94/24 gateway 192.168.0.254 cumulus@spine02$ sudo vi /etc/network/interfaces auto eth0 iface eth0 address 192.168.0.95/24 gateway 192.168.0.254 Setting Hostname The default for the hostname of a switch running Cumulus Linux is cumulus. Change this to the appropriate name based on your network architecture diagram, such as leaf01 or spine01, by modifying /etc/hostname and /etc/hosts. Changes to the /etc/hostname file do not take effect until you reboot that switch. You can find additional information in the Quick Start Guide, Setting Unique Host Names section of the Cumulus Linux documentation. For example: cumulus@leaf01$ sudo vi /etc/hostname leaf01 cumulus@leaf01$ sudo vi /etc/hosts 127.0.0.1 leaf01 localhost www.cumulusnetworks.com 17 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Configuring DNS Modify your DNS settings if needed. Add your domain, search domain, and nameserver entries to the /etc/resolv.conf file. These changes take effect immediately. cumulus@leaf01$ sudo vi /etc/resolv.conf domain example.com search example.com nameserver x.x.x.x Configuring NTP (Network Time Protocol) By default, NTP is installed and enabled on Cumulus Linux. To override the default NTP servers hosted by Cumulus Networks, edit the /etc/ntp.conf file. Find the 4 servers that are configured by default and modify the lines to reflect your desired NTP servers. cumulus@leaf01$ sudo vi /etc/ntp.conf # pool.ntp.org maps to about 1000 low-stratum NTP servers. Your server will # pick a different set every time it starts up. Please consider joining the # pool: <http://www.pool.ntp.org/join.html> server 0.cumulusnetworks.pool.ntp.org iburst server 1.cumulusnetworks.pool.ntp.org iburst server 2.cumulusnetworks.pool.ntp.org iburst server 3.cumulusnetworks.pool.ntp.org iburst After modifying the servers for NTP to use, restart the NTP daemon to read in the new changes to the configuration file. This can be performed with the sudo service ntp restart command. cumulus@leaf01:~$ sudo service ntp restart [ ok ] Stopping NTP server: ntpd. [ ok ] Starting NTP server: ntpd. cumulus@leaf01:~$ 2. Configure Leaf Switches Configure each switch in the MLAG pair individually Create peer link bond between pair Enable MLAG peering between leaf switches Configure Each Switch By default, a switch with Cumulus Linux freshly installed has no switch port interfaces defined. Define the basic characteristics of swp1 through swpN by creating stanza entries for each switch port (swp) in the /etc/network/interfaces file. Required statements include the following: auto <switch port name> iface <switch port name> An MTU setting of 9216 is recommended to avoid packet fragmentation. For example: cumulus@leaf01$ sudo vi /etc/network/interfaces # physical interface configuration 18 IMPLEMENTING A LAYER 2 DATA CENTER NETWORK WITH CUMULUS LINUX auto swp1 iface swp1 mtu 9216 auto swp2 iface swp2 mtu 9216 auto swp3 iface swp3 mtu 9216 . . . auto swp52 iface swp52 mtu 9216 You can set additional attributes, such as speed and duplex. Refer to the Configuring Switch Port Attributes chapter of the Cumulus Linux documentation for more information. Configure all leaf switches identically. Instead of manually configuring each interface definition, you can programmatically define them by using shorthand syntax that leverages Python Mako templates. Refer to the Automation Considerations chapter found later in this guide. Once all configurations have been defined in the /etc/network/interfaces file, run the ifquery command to ensure that all syntax is proper and the interfaces are created as expected: cumulus@leaf01$ ifquery -a auto lo iface lo inet loopback auto eth0 iface eth0 address 192.168.0.90/24 gateway 192.168.0.254 auto swp1 iface swp1 mtu 9216 . . . Once all configurations have been defined in the /etc/network/interfaces file, apply the configurations to ensure they are loaded into the kernel. There are several methods for applying configuration changes depending on when and what changes you want to apply. www.cumulusnetworks.com 19 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Command Action sudo ifreload -a Parse interfaces labeled with auto that have been added to or modified in the configuration file, and apply changes accordingly. Note: This command is disruptive to traffic only on interfaces that have been modified. sudo service networking restart Restart all interfaces labeled with auto as defined in the configuration file, regardless of what has or has not been recently modified. Note: This command is disruptive to all traffic on the switch including the eth0 management network. sudo ifup <swpX> Parse an individual interface labeled with auto as defined in the configuration file and apply changes accordingly. Note: This command is disruptive to traffic only on interface swpX. For example, on leaf01: cumulus@leaf01:~$ sudo ifreload -a or individually: cumulus@leaf01:~$ sudo ifup swp1 cumulus@leaf01:~$ sudo ifup swp2 . . . cumulus@leaf01:~$ sudo ifup swp52 Create Peer Link Bond between Switches Next, create a peer link bond on both switches by editing /etc/network/interfaces and placing the bond configuration after the swpN interfaces. Configure the peer link bond identically on both switches in the MLAG pair. For example, add the following peerlink stanza on leaf01 with bond members swp47 and swp48. The configuration will be identical on leaf02. For more information on bond settings, refer to the MLAG chapter in the Cumulus Linux documentation. cumulus@leaf01$ sudo vi /etc/network/interfaces # peerlink bond for clag auto peerlink iface peerlink bond-slaves swp47 swp48 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 20 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X Enable MLAG Peering between Switches An instance of the clagd daemon runs on each MLAG switch member to keep track of various networking information, including MAC addresses, needed to maintain the peer relationship. clagd communicates with its peer on the other switch across a layer 3 interface between the two switches. This layer 3 network should not be advertised by routing protocols, nor should the VLAN be trunked anywhere else in the network. This interface is designed to be a keep-alive reachability test and for synchronizing the switch state across the directly attached peer bond. Create the VLAN subinterface for clagd communication and assign an IP address for this subinterface. A unique .1q tag is recommended to avoid mixing data traffic with the clagd control traffic. To enable MLAG peering between switches, configure clagd on each switch by creating a peerlink subinterface in /etc/network/interfaces with a unique .1q tag. Set values for the following parameters under the peerlink subinterface: • • • • • address. The local IP address/netmask of this peer switch. o Recommended to use a link local address for example 169.254.1.X/30 clagd-enable. Set to yes (default) clagd-peer-ip. Set to the IP address assigned to the peer interface on the peer switch. clagd-backup-ip Set to an IP address on the peer switch reachable independently of the peer link. o For example, the management interface or a routed interface that does not traverse the peer link. clagd-sys-mac. Set to a unique MAC address you assign to both peer switches. o Recommended within the Cumulus Networks reserved range of 44:38:39:FF:00:00 through 44:38:39:FF:FF:FF. For example, configure leaf01 and leaf02 as follows: cumulus@leaf01$ sudo vi /etc/network/interfaces # VLAN for clagd communication auto peerlink.4094 iface peerlink.4094 address 169.254.1.1/30 clagd-enable yes clagd-peer-ip 169.254.1.2 clagd-backup-ip 192.168.0.91/24 clagd-sys-mac 44:38:39:ff:40:94 cumulus@leaf02$ sudo vi /etc/network/interfaces # VLAN for clagd communication auto peerlink.4094 iface peerlink.4094 address 169.254.1.2/30 clagd-enable yes clagd-peer-ip 169.254.1.1 clagd-backup-ip 192.168.0.90/24 clagd-sys-mac 44:38:39:ff:40:94 Note: MLAG can use any valid IP address pair for communication; however, we suggest using values from the IPv4 link local range defined by 169.254.0.0/16. These addresses are not exported by routing protocols and, since the peer communication VLAN is local to this peer link, the same IP address pairs can be used on all MLAG switch pairs. www.cumulusnetworks.com 21 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Because MTU configurations on a subinterface are inherited from the parent interface and the parent interface (peerlink) was previously defined with the MTU setting, there is no need to set MTU in the subinterface stanza. Reload the network configuration for the MLAG changes to be applied and start clagd: On leaf01: cumulus@leaf01:~$ sudo ifreload -a On leaf02: cumulus@leaf02:~$ sudo ifreload -a or individually restart just the peerlink and subinterface to apply the MLAG changes and start clagd: On leaf01: cumulus@leaf01:~$ sudo ifup peerlink cumulus@leaf01:~$ sudo ifup peerlink.4094 On leaf02: cumulus@leaf02:~$ sudo ifup peerlink cumulus@leaf02:~$ sudo ifup peerlink.4094 Once clagd is configured under the peerlink subinterface, it will automatically start when the system is booted. Once the interfaces have been started, verify the interfaces are up and have the proper IP addresses assigned. cumulus@leaf01:~$ ip addr show peerlink.4094 115: peerlink.4094@peerlink: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue state UP link/ether c4:54:44:9f:49:1e brd ff:ff:ff:ff:ff:ff inet 169.254.1.1/30 scope global peerlink.4094 inet6 fe80::7272:cfff:fe9d:4e64/64 scope link valid_lft forever preferred_lft forever cumulus@leaf02$ ip addr show peerlink.4094 105: peerlink.4094@peerlink: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue state UP link/ether c4:54:44:bd:00:20 brd ff:ff:ff:ff:ff:ff inet 169.254.1.2/30 scope global peerlink.4094 inet6 fe80::c654:44ff:fe9f:491e/64 scope link valid_lft forever preferred_lft forever Next, verify connectivity between the switches by issuing a ping to the backup IP address and across the peer bond from each switch to its peer switch. For example, from leaf01 you should be able to ping the clagd subinterface’s backup IP address, 192.168.0.91, and IP address, 169.254.1.2, on leaf02. cumulus@leaf01$ ping 192.168.0.91 PING 192.168.0.91 (192.168.0.91) 56(84) bytes of data. 64 bytes from 192.168.0.91: icmp_req=1 ttl=64 time=0.258 ms 64 bytes from 192.168.0.91: icmp_req=2 ttl=64 time=0.210 ms --- 192.168.0.91 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.210/0.234/0.258/0.024 ms cumulus@leaf01:~$ ping 169.254.1.2 22 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X PING 169.254.1.2 (169.254.1.2) 56(84) bytes of data. 64 bytes from 169.254.1.2: icmp_req=1 ttl=64 time=0.798 ms 64 bytes from 169.254.1.2: icmp_req=2 ttl=64 time=0.554 ms --- 169.254.1.2 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.554/0.676/0.798/0.122 ms Likewise, from leaf02 you should be able to ping the backup IP address and the IP address of the clagd subinterface on leaf01, 192.168.0.90 and 169.254.1.1. Verify MLAG port channel operation and the peer roles have been established. In the example below, switch leaf01 is operating as the primary peer switch while leaf02 is the secondary peer switch. By default, priority values for both switches are equal and set to 32768. The switch with the lower MAC address is given the primary role in the event of a priority tie. cumulus@leaf01$ clagctl The peer is alive Our Priority, ID, and Role: Peer Priority, ID, and Role: Peer Interface and IP: Backup IP: System MAC: 32768 c4:54:44:9f:49:1e primary 32768 c4:54:44:bd:00:20 secondary peerlink.4094 169.254.1.2 192.168.0.91 (active) 44:38:39:ff:40:94 When an MLAG-enabled switch is in the secondary role, it does not send BPDUs on dual-connected links; it only sends BPDUs on single-connected links. Also, in case the peer switch is determined to be not alive, the switch in the secondary role will roll back the link MAC address to be the bond interface MAC address instead of the clagd-sys-mac. By contrast, the switch in the primary role always uses the clagd-sys-mac and sends BPDUs on all single- and dual-connected links. The role of primary vs. secondary peer switch becomes important to consider when restarting switches. If a secondary peer switch is restarted, the LACP system ID remains the same. However, if a primary peer switch is restarted, the LACP system ID will change, which can be disruptive. Changing the priority does not cause a traffic interruption but will take a few seconds to switch over while the switch waits for the next peer update. To change the priority for leaf02 so that it becomes the primary and leaf01 becomes secondary, use the clagctl command with the priority parameter: cumulus@leaf02$ sudo clagctl priority 4096 cumulus@leaf02$ clagctl The peer is alive Our Priority, ID, and Role: 4096 c4:54:44:bd:00:20 primary Peer Priority, ID, and Role: 32768 c4:54:44:9f:49:1e secondary Peer Interface and IP: peerlink.4094 169.254.1.1 Backup IP: 192.168.0.90 (active) System MAC: 44:38:39:ff:40:94 www.cumulusnetworks.com 23 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE 3. Configure Spine Switches • • • Configure each switch in pair individually Create peer link bond between pair Enable MLAG peering between spine switches Configure Each Switch As in the case of the leaf switches, define all the switch ports on your spine switches that will be in use. A typical spine switch that is fully populated has swp1 through 32 defined. Define a port by creating stanza entries for each switch port (swp) in the /etc/network/interfaces file. For example: cumulus@spine01$ sudo vi /etc/network/interfaces # physical interface configuration auto swp1 iface swp1 mtu 9216 auto swp2 iface swp2 mtu 9216 auto swp3 iface swp3 mtu 9216 . . . auto swp32 iface swp32 mtu 9216 Configure both spine switches identically. Instead of manually configuring each interface definition, you can programmatically define them by using shorthand syntax that leverages Python Mako templates. Refer to the Automation Considerations chapter found later in this guide. As you did previously with the leaf switches, • • Use sudo ifquery -a to verify all interfaces were properly defined in the configuration file. Bring up the interfaces using sudo ifreload -a or individually using sudo ifup swpN. Create Peer Link Bond between Switches Next, create a peer link bond on both spine switches in the same manner as previously done for the leaf switch pairs. Do this by editing the /etc/network/interfaces file and placing the bond configuration after the swpN interfaces. Configure the peer link bond identically on both switches in the MLAG pair. Add the following peerlink stanza on spine01, with swp31 and swp32 for bond members. The configuration will be identical on spine02. cumulus@spine01$ sudo vi /etc/network/interfaces # peerlink bond for clag 24 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X auto peerlink iface peerlink bond-slaves swp31 swp32 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 Enable MLAG Peering between Switches To enable MLAG peering between switches, you configure the clagd daemon by adding the MLAG configuration parameters under the subinterface similarly as previously configured in the leaf switches. cumulus@spine01$ sudo vi /etc/network/interfaces # VLAN for clagd communication auto peerlink.4093 iface peerlink.4093 address 169.254.1.1/30 clagd-enable yes clagd-peer-ip 169.254.1.2 clagd-backup-ip 192.168.0.95/24 clagd-sys-mac 44:38:39:ff:40:93 cumulus@spine02$ sudo vi /etc/network/interfaces # VLAN for clagd communication auto peerlink.4093 iface peerlink.4093 address 169.254.1.2/30 clagd-enable yes clagd-peer-ip 169.254.1.1 clagd-backup-ip 192.168.0.94/24 clagd-sys-mac 44:38:39:ff:40:93 We are using .4093 for the peerlink communication between spine01 and spine02 in contrast to .4094 between leaf01 and leaf02. Using different VLAN IDs for different peerlink communication links avoids the potential for creating an undesired loop. Next, reload the network configuration for the MLAG changes to be applied and start clagd: On spine01: cumulus@spine01:~$ sudo ifreload -a On spine02: cumulus@spine02:~$ sudo ifreload -a www.cumulusnetworks.com 25 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE or individually restart just the peerlink and subinterface to apply the MLAG changes and start clagd: On spine01: cumulus@spine01:~$ sudo ifup peerlink cumulus@spine01:~$ sudo ifup peerlink.4093 On spine02: cumulus@spine02:~$ sudo ifup peerlink cumulus@spine02:~$ sudo ifup peerlink.4093 Once the interfaces have been started, verify the interfaces are up and have the proper IP addresses assigned. For example, on spine01: cumulus@spine01$ ip addr show peerlink.4093 36: peerlink.4093@peerlink: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue state UP link/ether c4:54:44:72:cf:ad brd ff:ff:ff:ff:ff:ff inet 169.254.1.1/30 scope global peerlink.4093 inet6 fe80::c654:44ff:fe72:cfad/64 scope link valid_lft forever preferred_lft forever Next, verify connectivity between the switches by issuing a ping across the peer bond from each switch to its peer switch. For example, from spine01 you should be able to ping the clagd subinterface’s backup IP address, 192.168.0.95, and IP address, 169.254.1.2, on spine02. cumulus@spine01$ ping 192.168.0.95 PING 192.168.0.95 (192.168.0.95) 56(84) bytes of data. 64 bytes from 192.168.0.95: icmp_req=1 ttl=64 time=0.277 ms 64 bytes from 192.168.0.95: icmp_req=2 ttl=64 time=0.300 ms --- 192.168.0.95 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.277/0.288/0.300/0.020 ms cumulus@spine01:~$ ping 169.254.1.2 PING 169.254.1.2 (169.254.1.2) 56(84) bytes of data. 64 bytes from 169.254.1.2: icmp_req=1 ttl=64 time=0.725 ms 64 bytes from 169.254.1.2: icmp_req=2 ttl=64 time=0.916 ms --- 169.254.1.2 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.725/0.820/0.916/0.099 ms Likewise, from spine02 you should be able to ping the clagd subinterface’s backup IP address, 192.168.0.94, and the IP address, 169.254.1.1, on spine01. Verify MLAG port channel operation and the peer roles have been established. In the following example, spine01 is operating as the primary peer switch and spine02 is the secondary peer switch: cumulus@spine01$ clagctl The peer is alive Our Priority, ID, and Role: Peer Priority, ID, and Role: Peer Interface and IP: Backup IP: System MAC: 26 32768 c4:54:44:72:cf:ad primary 32768 c4:54:44:72:dd:c9 secondary peerlink.4093 169.254.1.2 192.168.0.95 (active) 44:38:39:ff:40:93 IMPLEMENTING A LAYER 2 DATA CENTER NETWORK WITH CUMULUS LINUX 4. Set Up Spine/Leaf Network Fabric • • Create a switch-to-switch “uplink” bond on each leaf switch to the spine pair and verify MLAG Create a switch-to-switch bond on the spine pair to each leaf switch pair and verify MLAG Creating Switch-to-Switch Spine Bond on Leaf Switches Now that the peer relationship has been established on the leaf and spine switches, create the switch-to-switch bonds on the leaf switches by editing the /etc/network/interfaces file and placing the bond configuration after the swpN interfaces. You must specify a unique clag-id for every dual-connected bond on each peer switch; the value must be between 1 and 65535 and must be the same on both peer switches in order for the bond to be considered dual-connected. For the uplink1 bond the clag-id is set to 1000 and the host bond clag-ids start at 1. cumulus@leaf01$ sudo vi /etc/network/interfaces # uplink bond to spine auto uplink1 iface uplink1 bond-slaves swp49 swp50 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 clag-id 1000 Configure leaf02 identically as leaf01. Once these interfaces have been created, apply the configuration by using the ifreload command or individually bringing up the new interfaces on both switches: On leaf01: cumulus@leaf01:~$ sudo ifreload -a On leaf02: cumulus@leaf02:~$ sudo ifreload -a or individually: On leaf01: cumulus@leaf01:~$ sudo ifup uplink1 On leaf02: cumulus@leaf02:~$ sudo ifup uplink1 www.cumulusnetworks.com 27 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Verify switch-to-switch MLAG port channel operation on both leaf switches using the clagctl command. On leaf01: cumulus@leaf01$ clagctl The peer is alive Peer Priority, ID, and Role: Our Priority, ID, and Role: Peer Interface and IP: Backup IP: System MAC: 4096 c4:54:44:bd:00:20 primary 32768 c4:54:44:9f:49:1e secondary peerlink.4094 169.254.1.2 192.168.0.91 (active) 44:38:39:ff:40:94 Creating Switch-to-Switch MLAG Bond on Spine Pair to Each Leaf Switch Pair Create the switch-to-switch MLAG bonds on the spine switches by editing the /etc/network/interfaces file and placing the bond configuration after the swpN interfaces. For example, on spine01, create the downlink1 to aggregate traffic between the leaf01 and leaf02 pair and the leaf03 and leaf04 pair. The clag-id on both switches for the downlink1 bond is set to 1 and for the downlink2 bond set to 2. cumulus@spine01$ sudo vi /etc/network/interfaces # leaf01-leaf02 downlink auto downlink1 iface downlink1 bond-slaves swp1 swp2 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 clag-id 1 # leaf03-leaf04 downlink auto downlink2 iface downlink2 bond-slaves swp3 swp4 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 clag-id 2 Configure spine02 identically to spine01. Once these interfaces have been created, apply the configuration by using the ifreload command or individually bringing up the new interfaces on both switches: 28 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X On spine01: cumulus@spine01:~$ sudo ifreload –a On spine02: cumulus@spine02:~$ sudo ifreload -a or individually: On spine01: cumulus@spine01:~$ sudo ifup downlink1 On spine02: cumulus@spine02:~$ sudo ifup downlink1 Verify switch-to-switch MLAG port channel operation on both spine switches using the clagctl command. On spine01: cumulus@spine01$ clagctl The peer is alive Our Priority, ID, and Role: Peer Priority, ID, and Role: Peer Interface and IP: Backup IP: System MAC: 32768 c4:54:44:72:cf:ad primary 32768 c4:54:44:72:dd:c9 secondary peerlink.4093 169.254.1.2 192.168.0.95 (active) 44:38:39:ff:40:93 Dual Attached Ports Our Interface Peer Interface ------------------------------downlink1 downlink1 MLAG Id ------1 Verify the MLAG connection by running clagctl on the leaf switches. cumulus@leaf01$ clagctl The peer is alive Peer Priority, ID, and Role: Our Priority, ID, and Role: Peer Interface and IP: Backup IP: System MAC: Our Interface ---------------uplink1 www.cumulusnetworks.com 4096 c4:54:44:bd:00:20 primary 32768 c4:54:44:9f:49:1e secondary peerlink.4094 169.254.1.2 192.168.0.91 (active) 44:38:39:ff:40:94 Peer Interface ---------------uplink1 MLAG Id ------1000 29 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE 5. Configure VLANs Create the VLANs expected for your traffic. For example: VLAN 10: in-band hypervisor communications VLAN 15: in-band virtual SAN storage VLAN 20-23: VM traffic for WWW services VLAN 30: VM traffic for application 1 data VLAN 40: VM traffic for application 2 data To support VLANs in Cumulus Linux 2.5, a single bridge must be created in /etc/network/interfaces. Create the VLANaware bridge on all spine and leaf switches. For example, on leaf01: cumulus@leaf01$ sudo vi /etc/network/interfaces auto bridge iface bridge bridge-vlan-aware yes bridge-ports peerlink uplink1 bridge-vids 10,15,20,21,22,23,30,40,1000-2000 bridge-pvid 1 bridge-stp on This stanza defines a VLAN-aware bridge for high VLAN scale and assigns the infrastructure ports used for layer 2 links to the bridge, and assign VLANs to the network infrastructure. This list of VLAN IDs is inherited by all layer 2 interfaces in the bridge, unless different values are specified under an interface. In this configuration, all VLAN IDs are trunked to all layer 2 interfaces and bonds. The untagged or native VLAN for the infrastructure ports is defined by bridge-pvid; if one is not specified, the default value is VLAN ID 1. Setting the ID to 1 is a best practice for spanning tree switch interoperability. Finally, the stanza enables spanning tree on the bridge. To verify spanning tree operation on the bridge, use the mstpctl command. The following example shows the spanning tree port information for the uplink interface cumulus@leaf01$ mstpctl showportdetail bridge uplink1 bridge:uplink CIST info enabled yes role port id 8.006 state external port cost 305 admin external cost internal port cost 305 admin internal cost designated root 1.000.44:38:39:FF:00:00 dsgn external cost dsgn regional root 1.000.44:38:39:FF:77:00 dsgn internal cost designated bridge 1.000.44:38:39:FF:77:00 designated port admin edge port no auto edge port oper edge port no topology change ack point-to-point yes admin point-to-point restricted role no restricted TCN port hello time 2 disputed bpdu guard port no bpdu guard error network port no BA inconsistent Num TX BPDU 31237 Num TX TCN Num RX BPDU 38123 Num RX TCN Num Transition FWD 4 Num Transition BLK bpdufilter port no clag ISL no clag ISL Oper UP clag role primary clag dual conn mac clag remote portID F.FFF clag system mac 30 Root forwarding 0 0 305 0 8.001 yes no auto no yes no no 11 119 4 no 44:38:39:ff:77:0 44:38:39:ff:40:94 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X Finally, to verify the VLAN assignment, use the bridge vlan show command for example on spine01: cumulus@spine01$ bridge vlan show port vlan ids peerlink 1 PVID Egress Untagged 10 15 20-23 30 40 1000-2000 downlink1 1 PVID Egress Untagged 10 15 20-23 30 40 1000-2000 downlink2 1 PVID Egress Untagged 10 15 20-23 30 40 1000-2000 To verify the interfaces with which a specific VLAN is associated, for example VLAN 10, use the bridge vlan show vlan command: cumulus@spine01$ bridge vlan show vlan 10 VLAN 10: peerlink downlink1 downlink2 www.cumulusnetworks.com 31 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE 6. Connect Hosts • • Provision the host OS Connect hosts to leaf switches using virtual switches Provision the host operating systems, if you have not done so already. Since the data center networking fabric has already been set up, it’s time to connect the hosts to the leaf switches. To improve network reliability and optimization, use host link redundancy and aggregation. Set up LACP bonds on your hosts. On each leaf switch, create the interface and assign the host-facing ports to that VLAN. To enable LACP, first create a new host bond interface. This is a single interface in a bond interface configured on both switches in an MLAG-enabled pair of leaf switches. For example, create the following bond on leaf01. Make the identical configuration on leaf02. cumulus@leaf01$ sudo vi /etc/network/interfaces auto host-01 iface host-01 bond-slaves swp1 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 clag-id 1 Once the host bond is created, add the bond to the VLAN-aware bridge to enable VLAN trunking to the host. cumulus@leaf01$ sudo vi /etc/network/interfaces auto bridge iface bridge bridge-ports peerlink uplink1 host-01 By default, the list of VLANs is inherited on the host-01 interface. To override this (to prune VLANs 1000-2000 for example), configure the allowed VLANs on the host-01 interface without the 1000-2000 from the global configuration as follows: cumulus@leaf01$ sudo vi /etc/network/interfaces auto host-01 iface host-01 bridge-vids 10,15,20,21,22,23,30,40 # optional bridge-pvid 1 mstpctl-portadminedge yes mstpctl-bpduguard yes Optional: Set the untagged or native VLAN for the host bond if changing from the default VLAN ID of 1 or changing from the global value set under the VLAN-aware bridge. Optional: Set the port to admin edge mode immediately instead of waiting for the automatic admin edge detection. 32 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X Optional: It can be a best practice to enable BPDU guard on all the host-facing ports and bonds. Doing so prevents the accidental connection of a switch into the network on a host port; in the event a BPDU is unintentionally received, BPDU guard disables that port. To verify that BPDU guard has been enabled on a port, use the mstpctl command and look for bpdu information: cumulus@leaf01$ mstpctl showportdetail bridge host-01 | grep bpdu bpdu guard port yes bpdu guard error no bpdufilter port no To verify the VLAN information is configured correctly on both MLAG peer switches, use the clagctl verifyvlans command. This command checks that the VLANs are correctly configured on each dual-connected bond. To see the entire listing of VLANs as well as validate the configuration, add the -v flag: cumulus@leaf1$ clagctl -v verifyvlans Our Bond Interface VlanId Peer Bond Interface ----------------------------------------host-02 1 host-02 host-02 10 host-02 . . host-02 40 host-02 uplink01 1 uplink01 uplink01 10 uplink01 . . uplink01 2000 uplink01 host-01 1 host-01 host-01 10 host-01 . . host-01 40 host-01 Optional: By default, the list of VLANs is inherited on the host-01 interface. If the host only connects to a single VLAN, for example VLAN 10, instead of a trunk set the port to an access port as follows: cumulus@leaf01$ sudo vi /etc/network/interfaces auto host-01 iface host-01 bridge-access 10 www.cumulusnetworks.com 33 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE 7. Connect Spine Switches to Core Method 1. External Connectivity at Layer 2 The following steps assume the spine switches connect to core switches at layer 2. • • • Create an MLAG port channel to the core for each spine switch On each core device, create a vendor-equivalent MLAG bond to the spine switch pair Configure the core Depending on your core devices’ support for an MLAG-type configuration, you may need to configure two separate bonds if your core switches do not support a multi-chassis LACP solution. If your core devices support MLAG or the equivalent, create a single switch-to-switch MLAG bond on the spine switches by editing the /etc/network/interfaces file and placing the bond configuration after the swpN interfaces. For example, on spine01, create the core-uplink interface to aggregate traffic between spine01 and spine02 to the core. The clag-id for the core-uplink bond is set to 1000 on both spine switches. cumulus@spine01$ sudo vi /etc/network/interfaces # bond to core auto core-uplink iface core-uplink bond-slaves swp31 swp32 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 clag-id 1000 Configure spine02 identically to spine01 to aggregate traffic between spine01 and spine02 into a single MLAG bond. Once these interfaces have been created, apply the configuration by using the ifreload command or individually bringing up the new interfaces on both switches: On spine01: cumulus@spine01:~$ sudo ifreload -a On spine02: cumulus@spine02:~$ sudo ifreload -a or individually: On spine01: cumulus@spine01:~$ sudo ifup core-uplink On spine02: cumulus@spine02:~$ sudo ifup core-uplink 34 IMPLEMENTING A LAYER 2 DATA CENTER NETWO RK WITH CUMULUS LINU X Verify switch-to-switch MLAG port channel operation on both spine switches using the clagctl command. On spine01: cumulus@spine01$ clagctl The peer is alive Our Priority, ID, and Role: Peer Priority, ID, and Role: Peer Interface and IP: Backup IP: System MAC: 32768 c4:54:44:72:cf:ad primary 32768 c4:54:44:72:dd:c9 secondary peerlink.4093 169.254.1.2 192.168.0.95 (active) 44:38:39:ff:40:93 Dual Attached Ports Our Interface Peer Interface ------------------------------downlink1 downlink1 downlink2 downlink2 core-uplink core-uplink MLAG Id ------1 2 1000 Method 2. External Connectivity at Layer 3 To provide a layer 3 gateway for a VLAN, use the first hop redundancy protocol, Virtual Router Redundancy (VRR), provided in Cumulus Linux. VRR provides layer 3 redundancy by using the same virtual IP and MAC addresses on each switch, allowing traffic across an MLAG to be forwarded, regardless of which switch the traffic arrived on. In this configuration the switch pair work in an active/active capacity. VRR also works in a non-MLAG environment where a host is in an active/active or active/standby role. For more information, refer to the Virtual Router Redundancy (VRR) chapter in the Cumulus Linux documentation. The following steps assume the spine switches connect to core switches at layer 3. • • • Configure layer 3 switch virtual interface (SVI) gateways and connect the spine switches to core switches. Configure core-facing interfaces for IP transfer Configure dynamic routing protocol To enable the gateway, first create the layer 3 virtual interface for that VLAN on the VLAN-aware bridge. For example, to configure this on VLAN 10, add the following stanza to the network configuration: cumulus@spine01$ sudo vi /etc/network/interfaces auto bridge.10 iface bridge.10 address 10.1.10.2/24 address-virtual 00:00:5E:00:01:01 10.1.10.1/24 cumulus@spine02$ sudo vi /etc/network/interfaces auto bridge.10 iface bridge.10 address 10.1.10.3/24 address-virtual 00:00:5E:00:01:01 10.1.10.1/24 This stanza defines a routable interface to VLAN 10 and assigns an IP address to the interface. If, for example, your gateway is the first IP address in the subnet, such as 10.1.10.1, you should assign the actual interface IP address to 10.1.10.2. On spine02, make sure that the base IP address is unique, such as 10.1.10.3. www.cumulusnetworks.com 35 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE In this example configuration, address-virtual creates a virtual interface with the address of 10.1.10.1, assigned to the bridge VLAN ID 10, with virtual MAC 00:00:5E:00:01:01. These virtual IP and MAC addresses will be used between the pair of switches for load balancing and failover. The MAC address is in the VRRP MAC address range of 00:00:5E:00:01:XX and does not overlap with other MAC addresses in the network. For each desired gateway VLAN, replicate the above configuration, changing the IP addressing to match the subnet and the bridge.N where N is the VLAN ID. To verify the virtual address has been created, first check the bridge.10 interface: cumulus@spine01$ ip addr show bridge.10 53: bridge.10@bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue state UP link/ether c4:54:44:72:cf:35 brd ff:ff:ff:ff:ff:ff inet 10.1.10.2/24 scope global bridge.10 inet6 fe80::c654:44ff:fe72:cf35/64 scope link valid_lft forever preferred_lft forever When the address-virtual keyword is put under a layer 3 bridge ID it automatically creates a virtual interface with the syntax: (bridge name)-(VLAN ID)-v(virtual instance). In the above example, the = bridge name is bridge, the VLAN ID is 10, and the virtual instance is 0. Thus, the interface bridge-10-v0 has the virtual MAC address and virtual IP address assigned under the bridge.10 interface. If additional virtual addresses are added to the interface, each will have its own instance. To see that the virtual interface is operational, use the ip addr show command. cumulus@spine01$ ip addr show bridge-10-v0 77: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue state UNKNOWN link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff inet 10.1.10.1/32 scope global bridge-10-v0 inet6 fe80::200:5eff:fe00:101/64 scope link valid_lft forever preferred_lft forever 36 AUTOMATION CONSIDERA TIONS Automation Considerations Automation and Converged Administration Because Cumulus Linux is Linux, you can draw from a rich ecosystem of solutions to automate the provisioning and management of your switches. Automation tools can range from Linux-based scripts to applications that run natively on the Linux OS. In order to have distinct organizational roles and allow both server and network teams to focus on their areas of expertise and responsibility, yet have a ubiquitous form of centralized management, you can leverage tools such as Puppet, Chef, Ansible, and others that are widely used for compute management. The following sections provide automation and template examples using zero touch provisioning, Mako, Ansible, and Puppet. Additional examples are available in the form of demos in the Cumulus Networks Knowledge Base under Demos and Training at support.cumulusnetworks.com/hc/en-us/sections/200398866; the source code for the demos can be found in the Cumulus Networks GitHub repository at: github.com/CumulusNetworks/. Automated Switch Provisioning with Zero Touch Provisioning By default, a switch with Cumulus Linux freshly installed has the option to look for and invoke an automation script. This process is called zero touch provisioning and is triggered by the following conditions: • • • • Management port (eth0) configured to DHCP Management port is restarted, or switch is powered on, using the one of the following commands: o service networking restart o reboot o ifdown and ifup the switch port DHCP server is configured with option cumulus-provision-url code 239 DHCP server is configured with URL of automation script to execute Alternatively, zero touch provisioning can be run manually by running /usr/lib/cumulus/autoprovision. For example: cumulus@switch$ sudo /usr/lib/cumulus/autoprovision -u http://10.99.0.1/script.sh More details on the autoprovision command may be obtained by running the command with the -h option. Zero touch provisioning will run a script using the specified URL provided by DCHP or manually. Supported languages include Bash, Ruby, Python, and Perl. As a failsafe mechanism, zero touch provisioning will look for a CumulusLinuxAutoProvision flag in the HTTP header when retrieving the script prior to executing it. The script can automate many provisioning functions, such as: Install the Cumulus Linux license Change the hostname Run apt-get update Install automation tools such as Puppet or Chef Create users or integrate with authentication Configure sudoers for administrative privileges of users www.cumulusnetworks.com 37 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE For example, a script that installs the Cumulus Linux license and SSH keys from a central server 10.99.0.1 is as follows: #!/bin/bash # CUMULUS-AUTOPROVISIONING #install license from webserver wget -q -O /root/license.txt http://10.99.0.1/license.txt /usr/cumulus/bin/cl-license -i /root/license.txt #install ssh keys from webserver /usr/bin/wget -O /root/.ssh/authorized_keys http://10.99.0.1/authorized_keys exit 0 For more information, refer to the Zero Touch Provisioning chapter of the Cumulus Linux documentation. Network Configuration Templates Using Mako In the prior section Network Architecture Build Steps, we showed interface configurations that are manually entered in the /etc/network/interfaces file. Instead of manually configuring each interface definition, you can programmatically define them by using shorthand syntax that leverages Python Mako templates. You can use the following Mako template to represent what would take much longer to manually configure. For example, the following syntax can programmatically define the interface ports for the hosts. This Mako template: cumulus@leaf01$ sudo vi /etc/network/interfaces <% Host_ports = range(1,45) %> % for i in: Host_ports auto swp${i} iface swp${i} mtu 9216 % endfor is equivalent to: cumulus@leaf01$ sudo vi /etc/network/interfaces auto swp1 iface swp1 mtu 9216 auto swp2 iface swp2 mtu 9216 auto swp3 iface swp3 mtu 9216 . . 38 AUTOMATION CONSIDERA TIONS . auto swp43 iface swp43 mtu 9216 auto swp44 iface swp44 mtu 9216 For more information and an example, see the knowledge base article, Configuring /etc/network/interfaces with Mako, at support.cumulusnetworks.com/hc/en-us/articles/202868023. Automated Network Configuration Using Ansible Ansible is an open source, lightweight configuration management tool that can be used to automate many configuration tasks. Ansible does not require an agent be run on a switch; instead, Ansible manages nodes over SSH. Using Ansible, you can run automation tasks across many end points, whereas you use Mako within the context of a single switch. A particular script that runs a variety of tasks is referred to as a playbook in Ansible. The following example changes the MTU for a group of switch ports using the example previously shown with Mako. On the controller, run the tree command to show where the playbook and related files reside: root@ubuntu# tree . ├── ansible.cfg ├── ansible.hosts ├── roles │ └── leaf │ ├── tasks │ │ └── main.yml │ ├── templates │ │ └── interfaces.j2 │ └── vars │ └── main.yml └── leaf.yml The following files show how the playbook is run: root@ubuntu# cat ansible.cfg [defaults] host_key_checking=False hostfile = ansible.hosts The ansible.hosts file specifies switch1 and switch2 as the DNS names of two bare metal switches running Cumulus Linux: root@ubuntu# cat ansible.hosts [switches] switch1 switch2 www.cumulusnetworks.com 39 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE The leaf.yml file is what is processed to run the playbook. It points to the roles that should be run. root@ubuntu# cat leaf.yml --- hosts: switches user: root roles: - leaf The tasks/main.yml includes all the tasks being run; for example, to overwrite the /etc/network/interfaces file with the template file, roles/leaf/templates/interfaces.j2. root@ubuntu# cat roles/leaf/tasks/main.yml - name: configure interfaces template: src=interfaces.j2 dest=/etc/network/interfaces root@ubuntu# cat roles/leaf/templates/interfaces.j2 auto lo iface lo inet loopback auto eth0 iface eth0 inet dhcp {% if switches[inventory_hostname].interfaces is defined -%} {% for item in switches[inventory_hostname].interfaces -%} auto {{ item }} iface {{ item }} mtu 9216 {% endfor -%} {% endif -%} {% if switches[inventory_hostname].start_port is defined -%} {% for item in range(switches[inventory_hostname].start_port|int(),switches[inventory_hostname].stop_p ort|int()) -%} auto swp{{ item }} iface swp{{ item }} mtu 9216 {% endfor -%} {% endif -%} 40 AUTOMATION CONSIDERA TIONS The vars/main.yml is the template from where values are retrieved: root@ubuntu# cat roles/leaf/vars/main.yml switches: switch1: start_port: "1" stop_port: "44" switch2: interfaces: ["swp1", "swp2", "swp3", "swp4", "swp17", "swp18", "swp19", "swp20"] For switch1, a range similar to the Mako example is used, where switch ports 1 through 44 are set. switch2 has a different configuration where swp1 through swp4 are set, and then swp17 through swp20. To run the playbook, use the ansible-playbook command. The -k flag allows you to use a plaintext password rather than SSH keys. For example: root@ubuntu# ansible-playbook leaf.yml -k SSH password: PLAY [switches] *************************************************************** GATHERING FACTS *************************************************************** ok: [switch2] ok: [switch1] TASK: [leaf | configure interfaces] **************************************** changed: [switch2] changed: [switch1] PLAY RECAP ******************************************************************** switch1 : ok=2 changed=1 unreachable=0 failed=0 switch2 : ok=2 changed=1 unreachable=0 failed=0 www.cumulusnetworks.com 41 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Automated Network Configuration Using Puppet Puppet is an open source tool that can automate configuration management through the use of a controller that syncs with agents installed on each end point. While similar in functionality to Ansible, Puppet relies on agents installed on each switch being managed. Puppet utilizes TCP port 61613 for syncing between agents and a controller. In Puppet, a particular script that runs a variety of tasks is referred to as a manifest, similar to the idea of an Ansible playbook. The following example written in Puppet repeats the previous examples shown with Mako and Ansible. On the controller, run the tree command to show where the manifest and related directories and files reside: root@ubuntu# tree . ├── auth.conf ├── autosign.conf ├── fileserver.conf ├── manifests │ └── site.pp ├── modules │ ├── base │ │ ├── manifests │ │ │ ├── interfaces.pp │ │ │ ├── role │ │ │ │ └── switch.pp │ │ └── templates │ │ ├── interfaces.erb ├── puppet.conf └── templates The following files show how the manifest is run. The site.pp file is the main manifest. It contains site-wide and node-specific statements or definitions, which are blocks of Puppet code that will only be included in a node’s catalog — information that is specific to that node. For example, this is where you specify which interfaces you want to change MTU to 9216: root@ubuntu# cat manifests/site.pp node 'switch2' { $int_enabled = true $int_mtu = { swp1 => {}, swp2 => {}, swp3 => {}, swp4 => {}, } include base::role::switch } 42 AUTOMATION CONSIDERA TIONS In this example, the module is simply called base and contains a single manifest, interfaces.pp, containing the class base::interfaces. Classes generally configure all the packages, configuration files, and services needed to run an application. root@ubuntu# cat modules/base/manifests/interfaces.pp class base::interfaces { if $int_enabled == undef { $int_enabled = false } if ($int_enabled == true) { file { '/etc/network/interfaces': owner => root, group => root, mode => '0644', content => template('base/interfaces.erb') } service { 'networking': ensure => running, subscribe => File['/etc/network/interfaces'], hasrestart => true, restart => '/sbin/ifreload -a', enable => true, hasstatus => false, } } } The interfaces.erb file is a template that fetches the variables from site.pp. The template keeps eth0 as DHCP, checks to see if int_mtu is defined in site.pp, then loops through each interface provided and sets MTU to 9216. root@ubuntu# cat modules/base/templates/interfaces.erb auto eth0 iface eth0 inet dhcp <% if @int_mtu %> # interfaces <% int_mtu.each_pair do |key, value_hash| %> auto <%= key %> iface <%= key %> mtu 9216 <% end %> <% else %> # no interfaces <% end %> www.cumulusnetworks.com 43 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE The role is included in the main manifest. It ties all the manifests together that are utilized for this particular node. root@ubuntu# cat modules/base/manifests/role/switch.pp class base::role::switch { include base::interfaces } The puppet.conf file is the main configuration file on both the Puppet master and the Puppet client. On the Puppet master side, change the dns_alt_names value from the default, puppet: root@ubuntu# cat puppet.conf [main] logdir=/var/log/puppet vardir=/var/lib/puppet ssldir=/var/lib/puppet/ssl rundir=/var/run/puppet factpath=$vardir/lib/facter #modulepath=/etc/puppet/modules #templatedir=$confdir/templates dns_alt_names = puppet,cumulus-vm [master] # These are needed when the puppetmaster is run by passenger # and can safely be removed if webrick is used. ssl_client_header = SSL_CLIENT_S_DN ssl_client_verify_header = SSL_CLIENT_VERIFY On the Puppet client side, change the server value from the default, puppet: cumulus@switch2:~$ cat /etc/puppet/puppet.conf [main] logdir=/var/log/puppet vardir=/var/lib/puppet ssldir=/var/lib/puppet/ssl rundir=/var/run/puppet factpath=$vardir/lib/facter #templatedir=$confdir/templates server=cumulus [master] # These are needed when the puppetmaster is run by passenger # and can safely be removed if webrick is used. ssl_client_header = SSL_CLIENT_S_DN ssl_client_verify_header = SSL_CLIENT_VERIFY 44 AUTOMATION CONSIDERA TIONS Puppet agents run every 30 minutes by default. You can manually force a Puppet agent to run using the --test option. For example: cumulus@switch2:~# sudo puppet agent --test Info: Retrieving pluginfacts Info: Retrieving plugin Info: Caching catalog for switch2 Info: Applying configuration version '1415654245' Notice: /Stage[main]/Base::Interfaces/File[/etc/network/interfaces]/content: --- /etc/network/interfaces 2014-11-10 22:39:55.000000000 +0000 +++ /tmp/puppet-file20141110-3198-b8i1bh 2014-11-10 22:40:20.714158235 +0000 @@ -3,4 +3,27 @@ +# interfaces + +auto swp1 +iface swp1 + mtu 9216 + +auto swp2 +iface swp2 + mtu 9216 + +auto swp3 +iface swp3 + mtu 9216 + +auto swp4 +iface swp4 + mtu 9216 + + Info: /Stage[main]/Base::Interfaces/File[/etc/network/interfaces]: Filebucketed /etc/network/interfaces to puppet with sum 6f8a42d7ebd62f41c19324868384e095 Notice: /Stage[main]/Base::Interfaces/File[/etc/network/interfaces]/content: content changed '{md5}6f8a42d7ebd62f41c19324868384e095' to '{md5}ebf607a6ab09b595e81d1ff63e4b1196' Info: /Stage[main]/Base::Interfaces/File[/etc/network/interfaces]: Scheduling refresh of Service[networking] Notice: /Stage[main]/Base::Interfaces/Service[networking]/ensure: ensure changed 'stopped' to 'running' Info: /Stage[main]/Base::Interfaces/Service[networking]: Unscheduling refresh on Service[networking] Notice: Finished catalog run in 6.99 seconds www.cumulusnetworks.com 45 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Operational and Management Considerations Authentication and Authorization Cumulus Linux switches can be configured to use OpenLDAP v2.4 or later. Roles are used to segment privileges — No Access, Read Only, Administrator, Custom. In Cumulus Linux, users belonging to sudo group are equivalent to Administrator. Users created that are not part of the sudo group can be created and used for read-only access. Refer to the Authentication, Authorization, and Accounting chapter of the Cumulus Linux documentation for more information. Security Hardening From a security hardening perspective, the following ports are used in this scenario by Cumulus Linux: • • TCP 22: SSH – needed for Cumulus Linux switch management TCP 5342: MLAG (default port) – needed for MLAG communication between Cumulus Linux switches Refer to the Netfilter - ACL chapter in the Cumulus Linux documentation for further information. Accounting and Monitoring Cumulus Linux log files are written to the /var/log directory. Key log files include: Cumulus Linux Log File Description syslog This log file shows all issues from the kernel, Cumulus Linux HAL process (switchd), and issues from almost all other application processes, such as DHCP, smond (look for facility and INFO entries), and so forth. daemon.log Details when processes start and stop in the system. quagga/zebra.log Details issues from the Quagga zebra daemon. quagga/{protocol}.log Details issues from layer 3 routing protocols, like OSPF or BGP. switchd.log Logs activity on the switch monitored by the switchd process. clagd.log Logs activity of the clagd daemon for MLAG. 46 OPERATIONAL AND MANA GEMENT CONSIDERATION S Quality of Service (QoS) Considerations The use of industry-standard switches allows for unprecedented lower hardware costs and the ability to add additional bandwidth and hot spares. Thus, QoS becomes less important, and you do not have to think about QoS in the traditional sense. Instead, make sure to provision sufficient bandwidth to minimize oversubscription. QoS at its core defines which traffic should be dropped and when, since dropping traffic is almost never appropriate behavior in the modern data center. When applications begin choking due to bandwidth restrictions, it is much more productive to increase total bandwidth rather than tweak QoS to starve certain traffic. Since Cumulus Linux provides hardware choice, which has reduced pricing of switches significantly, the problem of expensive bandwidth is no longer a problem in today’s data center. Cumulus Networks strongly recommends not changing the default buffer and queue management values, as they are optimized for typical environments. However, Cumulus Networks realizes that many situations are unique and require further tuning based on the requirements of a particular network. To configure buffer and queue Management, Cumulus Linux divides inbound packets into four different priority groups. Cumulus Linux by default divides traffic into priority groups based on their Class of Service (CoS) values, rather than Differentiated Services Code Point (DSCP). Priority Group Description Default CoS Values Control Highest priority 7 Service Second highest priority traffic 2 Lossless Traffic protected by priority flow control none Bulk All remaining traffic 0, 1, 3, 4, 5, 6 To change the default behavior, edit the /etc/cumulus/datapath/traffic.conf file: cumulus@switch$ sudo vi /etc/cumulus/datapath/traffic.conf To specify to which queue a CoS value maps, edit the priority_group.<name>.cos_list values. For example: priority_group.control.cos_list = [1,3] To change from CoS to DSCP, change the value of traffic.packet_priority_source, keeping in mind DSCP values must be mapped to each of the priority groups. traffic.packet_priority_source = dscp Changes to traffic.conf require you to run service switchd restart before they take effect. Note: This command is disruptive to all switch port interfaces. cumulus@switch$ sudo service switchd restart Refer to the Configuring Buffer and Queue Management chapter of the Cumulus Linux documentation for further information. www.cumulusnetworks.com 47 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Link Pause The pause frame is a flow control mechanism that halts transmission for a specified period of time. For example, a server or other network node within the data center may be receiving traffic faster than it can handle and thus may benefit from using the pause frame. In Cumulus Linux, individual ports can be configured to execute link pause by: • • Transmitting pause frames when its ingress buffers become congested (TX pause enable) and/or Responding to received pause frames (RX pause enable) You can configure link pause by editing /etc/cumulus/datapath/traffic.conf. For example, to enable both types of link pause for swp2 and swp3: # To configure pause on a group of ports: # uncomment the link pause port group list # add or replace a port group name to the list # populate the port set, e.g. # swp1-swp4,swp8,swp50s0-swp50s3 # enable pause frame transmit and/or pause frame receive # link pause link_pause.port_group_list = [port_group_0] link_pause.port_group_0.port_set = swp2-swp3 link_pause.port_group_0.rx_enable = true link_pause.port_group_0.tx_enable = true A port group refers to one or more sequences of contiguous ports. Multiple port groups can be defined by: • • Adding a comma-separated list of port group names to the port_group_list. Adding the port_set, rx_enable, and tx_enable configuration lines for each port group. You can specify the set of ports in a port group in comma-separated sequences of contiguous ports. The syntax supports: • • • A single switch port, like swp1s0 or swp5 A range of regular switch ports, like swp2-swp5 A sequence within a breakout switch port, like swp6s0-swp6s3 It does not accept a sequence that includes multiple split ports; for example, swp6s0-swp7s3 is not supported. Restart switchd to allow your link pause configuration changes to take into effect. Note: this command is disruptive to all switch port interfaces. cumulus@switch$ sudo service switchd restart 48 CONCLUSION Conclusion Summary The fundamental abstraction of hardware from software and providing customers a choice through a hardware agnostic approach is core to the philosophy of Cumulus Networks and fits very well within the software-defined data center as well as within the confines of traditional layer 2 networks. Choice and CapEx savings are only the beginning. Long term OpEx savings come from the agility gained through automation. Cumulus Linux enables network and data center architects to leverage automated provisioning tools and templates to define and provision traditional layer 2 networks as if the switches were Linux servers with 50+ NICs. Cumulus Linux leverages the strength, familiarity, and maturity of Linux to propel networking into the 21st century. References Article/Document URL Cumulus Linux Documentation https://docs.cumulusnetworks.com Quick Start Guide Understanding Network Interfaces MLAG IGMP and MLD Snooping LACP Bypass Virtual Router Redundancy (VRR) Authentication, Authorization, and Accounting Configuring Buffer and Queue Management Cumulus Linux Knowledge Base Articles Configuring /etc/network/interfaces with Mako Configuring a Management Namespace Demos and Training https://support.cumulusnetworks.com/hc/enus/articles/202868023 https://support.cumulusnetworks.com/hc/enus/articles/202325278 https://support.cumulusnetworks.com/hc/enus/sections/200398866 Linux Training Videos from Cumulus https://cumulusnetworks.com/technical-videos/ Cumulus Linux Product Information http://cumulusnetworks.com/product/pricing/ Software Pricing http://cumulusnetworks.com/hcl/ Hardware Compatibility List (HCL) Cumulus Linux Downloads www.cumulusnetworks.com http://cumulusnetworks.com/downloads/ 49 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Article/Document URL Cumulus Linux Repository http://repo.cumulusnetworks.com Cumulus Networks GitHub Repository https://github.com/CumulusNetworks/ 50 APPENDIX A: EXAMPLE /ETC/NETWORK/INTERFA CES CONFIGURATIONS Appendix A: Example /etc/network/interfaces Configurations leaf01 cumulus@leaf01$ cat /etc/network/interfaces auto eth0 iface eth0 address 192.168.0.90/24 gateway 192.168.0.254 # physical interface configuration auto swp1 iface swp1 mtu 9216 auto swp2 iface swp2 mtu 9216 auto swp3 iface swp3 mtu 9216 . . . auto swp52 iface swp52 mtu 9216 # peerlink bond for clag auto peerlink iface peerlink bond-slaves swp47 swp48 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 # VLAN for clagd communication www.cumulusnetworks.com 51 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE auto peerlink.4094 iface peerlink.4094 address 169.254.1.1/30 clagd-enable yes clagd-peer-ip 169.254.1.2 clagd-backup-ip 192.168.0.91 clagd-sys-mac 44:38:39:ff:40:94 # uplink bond to spine auto uplink1 iface uplink1 bond-slaves swp49 swp50 bond-mode 802.3ad bond-lacp-rate 1 bond-min-links 1 bond-miimon 100 bond-use-carrier 1 bond-xmit-hash-policy layer3+4 clag-id 1000 auto host-01 iface host-01 bond-slaves swp1 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 bridge-vids 10,15,20,21,22,23,30,40 mstpctl-portadminedge yes mstpctl-bpduguard yes clag-id 1 auto bridge iface bridge bridge-vlan-aware yes bridge-ports peerlink uplink1 host-01 bridge-vids 10,15,20,21,22,23,30,40,1000-2000 bridge-pvid 1 bridge-stp on 52 APPENDIX A: EXAMPLE /ETC/NETWORK/INTERFACES CONFIGURATI ONS leaf02 cumulus@leaf02$ sudo vi /etc/network/interfaces auto eth0 iface eth0 address 192.168.0.91/24 gateway 192.168.0.254 # physical interface configuration auto swp1 iface swp1 mtu 9216 auto swp2 iface swp2 mtu 9216 auto swp3 iface swp3 mtu 9216 . . . auto swp52 iface swp52 mtu 9216 # peerlink bond for clag auto peerlink iface peerlink bond-slaves swp47 swp48 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 # VLAN for clagd communication auto peerlink.4094 iface peerlink.4094 address 169.254.1.2/30 clagd-enable yes clagd-peer-ip 169.254.1.1 clagd-backup-ip 192.168.0.90 clagd-sys-mac 44:38:39:ff:40:94 www.cumulusnetworks.com 53 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE # uplink bond to spine auto uplink1 iface uplink1 bond-slaves swp49 swp50 bond-mode 802.3ad bond-lacp-rate 1 bond-min-links 1 bond-miimon 100 bond-use-carrier 1 bond-xmit-hash-policy layer3+4 clag-id 1000 auto host-01 iface host-01 bond-slaves swp1 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 bridge-vids 10,15,20,21,22,23,30,40 mstpctl-portadminedge yes mstpctl-bpduguard yes clag-id 1 auto bridge iface bridge bridge-vlan-aware yes bridge-ports peerlink uplink1 host-01 bridge-vids 10,15,20,21,22,23,30,40,1000-2000 bridge-pvid 1 bridge-stp on 54 APPENDIX A: EXAMPLE /ETC/NETWORK/INTERFA CES CONFIGURATIONS spine01 cumulus@spine01$ sudo vi /etc/network/interfaces auto eth0 iface eth0 address 192.168.0.94/24 gateway 192.168.0.254 # physical interface configuration auto swp1 iface swp1 mtu 9216 auto swp2 iface swp2 mtu 9216 auto swp3 iface swp3 mtu 9216 . . . auto swp32 iface swp32 mtu 9216 # peerlink bond for clag auto peerlink iface peerlink bond-slaves swp31 swp32 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 # VLAN for clagd communication auto peerlink.4093 iface peerlink.4093 address 169.254.1.1/30 clagd-enable yes clagd-peer-ip 169.254.1.2 clagd-backup-ip 192.168.0.95 clagd-sys-mac 44:38:39:ff:40:93 www.cumulusnetworks.com 55 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE # leaf01-leaf02 downlink auto downlink1 iface downlink1 bond-slaves swp1 swp2 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 clag-id 1 auto downlink2 iface downlink2 bond-slaves swp3 swp4 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 clag-id 2 # Need connection to core auto bridge iface bridge bridge-vlan-aware yes bridge-ports peerlink downlink1 downlink2 bridge-vids 10,15,20,21,22,23,30,40,1000-2000 bridge-pvid 1 bridge-stp on mstpctl-treeprio 4096 56 APPENDIX A: EXAMPLE /ETC/NETWORK/INTERFA CES CONFIGURATIONS spine02 cumulus@spine02$ sudo vi /etc/network/interfaces auto eth0 iface eth0 address 192.168.0.95/24 gateway 192.168.0.254 # physical interface configuration auto swp1 iface swp1 mtu 9216 auto swp2 iface swp2 mtu 9216 auto swp3 iface swp3 mtu 9216 . . . auto swp32 iface swp32 mtu 9216 # peerlink bond for clag auto peerlink iface peerlink bond-slaves swp31 swp32 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 # VLAN for clagd communication auto peerlink.4093 iface peerlink.4093 address 169.254.1.2/30 clagd-enable yes clagd-peer-ip 169.254.1.1 clagd-backup-ip 192.168.0.94 clagd-sys-mac 44:38:39:ff:40:93 www.cumulusnetworks.com 57 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE # Need connection to core # leaf01-leaf02 downlink auto downlink1 iface downlink1 bond-slaves swp1 swp2 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 clag-id 1 auto downlink2 iface downlink2 bond-slaves swp3 swp4 bond-mode 802.3ad bond-miimon 100 bond-use-carrier 1 bond-lacp-rate 1 bond-min-links 1 bond-xmit-hash-policy layer3+4 clag-id 2 # Need connection to core auto bridge iface bridge bridge-vlan-aware yes bridge-ports peerlink downlink1 downlink2 bridge-vids 10,15,20,21,22,23,30,40,1000-2000 bridge-pvid 1 bridge-stp on mstpctl-treeprio 4096 58 APPENDIX A: EXAMPLE /ETC/NETWORK/INTERFA CES CONFIGURATIONS oob-mgmt When utilizing a management networking switch running Cumulus Linux, the switch can be configured by editing /etc/network/interfaces file with the following configuration: cumulus@oob-mgmt$ sudo vi /etc/network/interfaces auto br0 iface br0 bridge-ageing 300 bridge-ports regex (swp[0-9]*[s]*[0-9]) bridge-stp on and reloading all networking: cumulus@oob-mgmt$ sudo ifreload -a www.cumulusnetworks.com 59 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Appendix B: Network Design and Setup Checklist Tasks Considerations 1. Set up physical network and basic configuration of all switches. Select network switches Refer to the HCL and hardware guides at http://cumulusnetworks.com/support/hcl. Out-of-band management: Assume minimal traffic requirements, used for initial image loading and then management and monitoring, with no day-to-day data traffic. 48 port 1G switch is sufficient. Leaf switches: Choose between at least a 48 port 10G switch or 32 port 40G switch with breakout cables. Consider price and future proofing. Breakout cables provide more 10G ports on a 40G switch than a single 10G switch. Spine switches: Choose at least a 10G switch or a 40G switch; 40G for more traffic aggregation. Consider price and future proofing. Use identical switches in pairs to facilitate easier management; more for hot spares. Plan cabling Refer to knowledge base article, Suggested Transceivers and Cables: https://support.cumulusnetworks.com/hc/en-us/articles/202983783. Generally, higher number ports on a switch are reserved for uplink ports, so: Assign downlinks or host ports to the lower end, like swp1, swp2 Reserve higher number ports for network Reserve highest ports for MLAG peer links Connect all console ports. See the Quick Start Guide in the Cumulus Linux documentation. Install Cumulus Linux Obtain the latest version of Cumulus Linux. Obtain license key, which is separate from Cumulus Linux OS distribution. To minimize variables and aid in troubleshooting, use identical versions across switches — same version X.Y.Z, packages, and patch levels. At a minimum, ensure switches in MLAG pairs have identical versions. See the Quick Start Guide in the Cumulus Linux documentation. Reserve management space Reserve pool of IP addresses. Define hostnames and DNS. RFC 1918 should be used where possible. 60 APPENDIX B: NETWORK DESIGN AND SETUP CHE CKLIST Tasks Considerations Determine IP addressing Use DCHP to avoid manually configuring on each switch: gateway, IP address, DNS information, hostname information, zero touch provisioning URL, installation URL for ONIE. Or use static IP addresses for explicit control, and avoiding managing MAC address to IP address table. Edit configuration files Apply standards and conventions to promote similar configurations. For example, place stanzas in the same order in configuration files across switches and specify the child interfaces before the parent interfaces (so a bond member appears earlier in the file than the bond itself, for example). This allows for standardization and easier maintenance and troubleshooting, and ease of automation and the use of templates. Consider naming conventions for consistency, readability, and manageability. Doing so helps facilitate automation. For example, call your leaf switches leaf01 and leaf02 rather than leaf1 and leaf2. Use all lowercase for names Avoid characters that are not DNS-compatible. Define child interfaces before using them in parent interfaces. For example, create the member interfaces of a bond before defining the bond interface itself. Refer to the Configuring and Managing Network Interfaces chapter of the Cumulus Linux documentation for more information. 2. Configure leaf switches. Define switch ports (swp) in /etc/network/interfaces on a switch Instantiate swp interfaces for using the ifup and ifdown commands. Set MTU By default, MTU is set to 1500. Set to a high value, like 9216, to avoid packet fragmentation. Set speed and duplex These settings are dependent on your network. Create peer link bond between pair of switches Assign IP address for clagd peerlink. Consider using a link local address (RFC 3927, 169.254/16) to avoid advertising, or an RFC 1918 private address. Use a very high number VLAN if possible to separate the peer communication traffic from typical VLANs handling data traffic. Valid VLAN tags end at 4096. Enable MLAG Set up MLAG in switch pairs. There’s no particular order necessary for connecting pairs. Assign clagd-sys-mac Assign a unique clagd-sys-mac value per pair. This value is used for spanning tree calculation, so assigning unique values will prevent overlapping MAC addresses. Assign priority Use the range reserved for Cumulus Networks: 44:38:39:FF:00:00 through 44:38:39:FF:FF:FF. Define primary and secondary switches in an MLAG configuration, if desired. Otherwise, by default the switches will elect a primary switch on their own. Set priority if you want to explicitly control which switches are designated primary switches. www.cumulusnetworks.com 61 DATA CENTER LAYER 2 HIGH AVAILAB ILITY: VALIDATED DESIGN GUI DE Tasks Considerations Automate setup Use automation for setting up many switches. Set up a pair of switches manually and verify all connectivity first before attempting to programmatically recreate. Try to configure switches as similarly as possible. The main differences between two switches would be the loopback address, management IP address, MLAG Peer, and VRR. 3. Configure spine switches. Repeat steps for configuring leaf switches Steps for configuring leaf switches are similar to configuring spine switches. Consider using a different VLAN number for spine peer bonds than leaf peer bonds for distinction to avoid accidentally trunking across the same VLAN. 4. Set up spine/leaf network fabric. Create a switch-to-switch “uplink” bond on each leaf switch to the spine pair and verify MLAG. Use different clag-ids for uplinks versus host bonds. Create a switch-to-switch bond on the spine pair to each leaf switch pair and verify MLAG. 5. Configure VLANs. Define VLANS Determine list of VLANs needed. What VLANs on what interfaces. Prune VLANs where possible. Set native VLAN for trunk ports. 6. Connect and configure hosts. Set up high availability on hosts 62 Set LACP to fast mode (default). Enable BDPU guard for port-facing hosts to prevent hooking up a switch on the host port. APPENDIX B: NETWORK DESIGN AND SETUP CHE CKLIST 7. Connect spine switches to core. Connect to core switch at layer 2, if applicable Check MTU setting for the connection to the core. This depends on what the core needs. Check MLAG-type capability. Determine what VLANs need to be trunked to the core instead of pruned. Ensure the native VLAN matches the core native VLAN. Connect to core switch at layer 3, if applicable Check MTU setting for the connection to the core. This depends on what the core needs. Determine how to handle the default route: originate or learn? Specify the IP address subnet information for layer 3 VLANs. Decide what IP address to use for the gateway. Typically this is either .1 or .254. Assign IP addresses for VRR. Typically, they are adjacent to the gateway, so .2 or .253. Assign virtual MAC addresses to use for VRR. For better manageability, use the same MAC address on both peer switches, from the reserved range for VRRP: 00:00:5E:00:01:XX. Determine if routing will use static or dynamic routing. If dynamic: • • • Specify router-id and advertised networks. Consider IPv4 and/or IPv6. Determine protocol, OSPF or BGP. OSPF: • • • • • • Define area. Verify MTU setting. OSPF in particular can run into problems with improper MTU settings. Define reference bandwidth. Set timers. Define network type, such as point-to-point. Choose between OSFP numbered or OSPF unnumbered interfaces. BGP: • • • www.cumulusnetworks.com Define autonomous system number (ASN). Set timers. Choose between iBGP or eBGP. 63
advertisement
Key Features
- Multi-chassis link aggregation (MLAG)
- Link aggregation
- High availability
- Hardware agnosticism
- Open source protocols & tools
- Automation & orchestration
- Converged administration
- Zero touch provisioning
- Quality of Service (QoS)
Frequently Answers and Questions
What is MLAG and how does it work?
MLAG is a technology that allows you to create a logical switch that spans multiple physical switches. This is done by creating a peer link between the two switches and configuring a clagd daemon on each switch. The clagd daemon communicates with its peer on the other switch across a layer 3 interface between the two switches. This layer 3 network should not be advertised by routing protocols, nor should the VLAN be trunked anywhere else in the network. This interface is designed to be a keep-alive reachability test and for synchronizing the switch state across the directly attached peer bond.
How can I connect my hosts to the leaf switches?
Hosts can be connected to leaf switches via dual 10G links. These links should be configured as a LACP bond to ensure high availability.
What is the purpose of the out-of-band management network?
The out-of-band management network is used to administer the infrastructure elements, such as network switches, physical servers, and storage systems. It is expected to host both DHCP and HTTP servers, such as isc-dhcp and apache2, as well as provide DNS reverse and forward resolution.
How do I scale out my Layer 2 network?
Scaling out the architecture involves adding more hosts to the access switch pairs, and then adding more access switches in pairs as needed. Once the limit for the spine switch pair approaches, an additional network pod of spine/leaf switches may be added.
How can I connect my spine switches to the core?
You can connect your spine switches to the core at either Layer 2 or Layer 3. At Layer 2, the core switches need to support a vendor-specific form of MLAG. At Layer 3, the spine switches route traffic and you will need to configure layer 3 switch virtual interface (SVI) gateways.