Integrating network monitoring, automation, and notification tools to centralize and automate network processes Lessons learned from an integrated HP Network Automation and Network Node Manager i software deployment with TelAlert® notification in an MPLS environment Technical white paper IT challenges in a world of acquisitions After years of growth by acquisition, a midsize U.S.-based energy company was facing formidable IT challenges. From a technology standpoint, the company needed to assimilate diverse network architectures acquired from multiple companies with offices in several states. From an operations standpoint, the company needed to reduce the mean time to repair (MTTR) for network outages in field offices. Typically, MTTR was an unacceptable two and a half to three days. While the company’s petroleum reserves were doubling in size every four months, budgets did not allow network operations staff to grow linearly with the company. This meant it needed to find integrated and automated approaches that would enable the company’s network operations team to increase its productivity. To address these challenges, the company turned to Allen Corporation of America, a highly regarded professional services firm and HP Software and Solutions partner. A comprehensive solution Working closely with the energy company, Allen developed a comprehensive network automation and monitoring solution with integrated notification capabilities. The solution was designed to leverage the combined capabilities of HP Network Automation software, HP Network Node Manager i (NNMi) software, and the TelAlert notification system. Figure 1. Adding devices from HP NNMi to HP Network Automation To add devices to HP Network Automation’s list of supported devices, you add the OIDs from HP NNMi. •HP Network Automation (version 7.5) was selected for its ability to standardize network configurations. It tracks, regulates, and automates configuration and software changes across globally distributed, multivendor networks. •HP NNMi Advanced (version 8.13) was selected to centralize network monitoring. It provides tools for managing unified fault, availability, performance, and advanced network services for physical and virtualized network infrastructure. •HP NNMi was integrated with TelAlert to automate the process of notifying operations personnel of network-related issues and to avoid the need to staff a network operations center (NOC) around the clock. This paper includes key advice Allen discusses for three important aspects of the solution: integrating HP Network Automation with HP NNMi, monitoring MPLS networks with HP NNMi, and stabilizing staffing using TelAlert notification. Integrating HP NNMi with HP Network Automation HP Network Automation and HP NNMi are designed to work in a complementary manner. HP Network Automation finds and configures new devices on the network and then passes the device information to HP NNMi. Integration takes place at the graphical user interface (GUI) level. This integration brings HP Network Automation diagnostics into HP NNMi. In the background, the two applications share data with each other. This data integration allows information on devices to be imported into HP Network Automation from HP NNMi. To make this match, HP Network Automation must know the universally unique ID (UUID) that HP NNMi gives to a device. This UUID is the tag the two applications share to make the integration work. Linking HP Network Automation with HP NNMi To link HP Network Automation with HP NNMi, you run a connector installer on the HP Network Automation server. It connects to HP NNMi and installs the components there as well. The two applications can run on a single server or different servers. HP NNMi is installed first, followed by HP Network Automation, which configures itself around HP NNMi. The integration team initially installed the two applications on a single server, and then later decided to move HP Network Automation to its own machine. 2 They discovered that breaking and then re-establishing the integration causes a lot of extra work. For example: HP NNMi continues to look for HP Network Automation on the same server. A lesson learned: Think your way through the impact of putting both applications on a single system, especially in light of HP NNMi’s memory requirements when managing a large number of nodes. If you’re not sure which approach will work best for you, take the safe route and put the applications on different servers. Importing HP NNMi devices into HP Network Automation To import HP NNMi devices to HP Network Automation, run nnmimport on the HP NNMi server, which queries HP Network Automation for a list of supported Object IDs (OIDs). HP NNMi then sends HP Network Automation information on only the nodes with supported OIDs. In the integration effort, the energy company wanted all devices from HP NNMi to be sent to HP Network Automation, which automates the complete operational lifecycle of network devices. This approach leverages the comprehensive discovered network inventory in NNMi. To meet this requirement, the integration team configured HP Network Automation to essentially pretend that it supported a broader list of OIDs. The team did this by adding OIDs to a configuration file on the Network Automation server, as shown in Figure 1. This workaround enabled HP Network Automation to receive information on all the devices in the HP NNMi database. Monitoring MPLS with HP NNMi In the discovery process, HP NNMi queries devices to determine what they connect with, and then creates a map showing how devices are connected. When devices are on MPLS networks, HP NNMi doesn’t have the information it needs to understand how they are connected to the rest of the environment. That’s because the switches and routers in a service provider’s environment are not exposed to customers who are using the MPLS service. Figure 2. Discovery islands HP NNMi provides a map that shows how devices on the network are connected. If it can’t determine how devices are connected, HP NNMi shows them as islands. When a node is in the “Important Nodes” group, it turns red when its status is unknown. With this understanding in mind, the Allen team created a filter to automatically populate all of the routers within MPLS containers into the “Important Nodes” node group. With that designation, when the status of an MPLS router becomes unknown, it turns red, as does the container it is housed in on the network map. The result is that an MPLS outage now causes the isolated sites to turn red on all map displays. Stabilizing staffing using TelAlert notification This means that HP NNMi can’t see the explicit physical connection at Layer 2 or Layer 3. This reality makes discovery across virtual boundaries inherently difficult in MPLS networks. When HP NNMi can’t determine how devices are connected, it shows them as islands on a network, as shown in Figure 2. On an NNMi network map, failed devices turn red to indicate a critical state. When a device in an MPLS service provider’s environment fails, a connection in the NNMi topology breaks, but the device causing it is not in the NNMi topology, so no device turns red on the map. In these cases, HP NNMi turns the nodes in the island isolated by the failed MPLS connection to blue, to indicate unknown status. In a time when the demands on network operators are growing faster than staff headcounts, automation is one of the keys to containing staffing levels and costs. This was the case with the energy company, which couldn’t justify the expense of staffing a NOC around the clock. Instead, it wanted to leverage an automated notification system that would work in tandem with HP NNMi to alert its operators to network problems. The customer did not want to bombard its operators with alerts that indicated HP NNMi was in the process of investigating potential network problems. It felt that they didn’t need to be made aware of each step in the process. What’s more, many network incidents are only momentary issues—such as power glitches—that are quickly cleared by HP NNMi. With these thoughts in mind, the company decided to suppress the step-by-step incident updates that are issued as HP NNMi progresses through its root-cause analysis. It wanted the notification system to send out a single message with a final answer—such as “Node Down”—if any message had to be sent at all. When an MPLS site has many nodes, HP NNMi administrators often create a container to show just one symbol on the network map to represent the entire site. If all of the nodes in this container are blue, indicating unknown status, the container itself still appears as green on the network map, suggesting that all is well with the nodes in the container and providing no indication on the map that the connectivity to the site is down. The container would turn red if one of the nodes within the container turned red. But that doesn’t happen because HP NNMi doesn’t know the status of the individual nodes—it only knows that it can’t connect to the site. To meet these requirements, the Allen team configured HP NNMi to post messages to TelAlert with a built-in three-minute delay when specific incidents enter the “Registered” state. When these incidents change state to “Closed,” a second call is made to TelAlert to cancel the message. If the second call is received within three minutes, TelAlert does not send the message. NNMi does post a critical “Root Cause” incident, indicating that the site is isolated, but the site container does not change to red by default. In the integration effort, the customer wanted the container to turn red if the site was isolated, so operators could see that they had problems at the site represented by the container. To make this happen, the Allen team used the “Important Nodes” node group in HP NNMi to change the default behavior of the software. HP NNMi 9 introduces lifecycle transition actions customized to node groups. Using this feature, different messages can be delivered through TelAlert based on a device’s node group membership. In one exception to the three-minute-delay rule, the customer wanted three core network team members to be alerted to any incident that was kicked off by a linkdown trap. The Allen team programmed the solution to do this. New feature in HP NNMi 9 3 The case for automated notification Staffing a NOC on a 24/7 basis is a costly proposition. To guarantee that at least one person is in your operations center watching the console at all times, you need to have 13 people on payroll. Why is that? This level of staffing, known as the Rule of 13, allows for a 40-hour week, vacation, training, and sick time. This rule also acknowledges the reality that you actually must schedule two people for each shift, so that when one leaves to use the restroom, the NOC remains staffed. It further assumes that each shift is ten hours to allow handoffs between shifts, that the normal work week is four ten hour days and that both shifts report for training once a week. The good news is that the first 13 people will support a workload of up to two people at a time. The bad news is that after you exceed the load that two people can handle, each additional seat you need to fill in the operations center will require eight more people on payroll. Numbers like these are a key driver for automated notification systems that free operators from the need to be physically present in a NOC. Moving forward About Allen Corporation In this project, the integration of HP Network Automation, HP NNMi, and TelAlert notification yielded a highly automated, customized solution for network management. This solution is helping the energy company assimilate acquired technology, maintain standardized configurations, centralize network monitoring, and reduce staffing demands with automated notification. Allen Corporation of America, Inc. is a professional services company offering industry-leading information technology, logistics, cyber security, and training solutions to the private and public sectors. Allen has its headquarters in Fairfax, Virginia, and offices throughout the United States and in Europe. In addition, the use of HP Network Automation has helped the company condense processes and resolve network outages in field offices in less time. With HP Network Automation, the company has reduced MTTR by 50 percent. Looking ahead, the company recognizes that its networks and IT systems must continue to evolve. In recognition of this, Allen is working with the company to lay the groundwork for a planned upgrade to new versions of HP NNMi and HP Network Automation software. In addition, the company is looking at other offerings in the HP Data Center Automation suite and HP Client Automation suite. To learn more, visit www.allencorp.com. For more information To learn more about HP NNMi, visit www.hp.com/software/nnmi. To learn more about HP Network Automation, visit www.hp.com/go/nasoftware. To learn more about Allen Corporation, visit www.allencorp.com. Share with colleagues Get connected www.hp.com/go/getconnected Get the insider view on tech trends, alerts, and HP solutions for better business outcomes © Copyright 2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. TelAlert is a registered trademark of MIR3. MIR3 is a service mark of MIR3, Inc. 4AA2-6870ENW, Created November 2010 This is an HP Indigo digital print.