Mohammedadem Abdulkadir Monitoring Network Availablity Using Nagios Helsinki Metropolia University of Applied Sciences Bachelor of Engineering Information Technology Thesis 5 May 2015 Abstract Author(s) Title Mohammedadem Abdulkadir Monitoring network availability using Nagios Number of Pages Date 35 pages 5 May 2015 Degree Bachelor of Engineering Degree Programme Information Technology Specialisation option Instructor(s) Bruk Yirdaw, Project Manager Matti Puska, M.Sc, Principal Lecturer The main goal of the thesis was to implement a simple network monitoring system, which would ensure the availability of network devices and services at Metropolia UAS Communications and Network Engineering laboratory, Leppavaara campus. The system needed to be easy to use, cost-effective and compatible to implement for critical lab services and infrastructure components. Currently there are many cost-effective network monitoring tools. However, this project began by short-listing and comparing three open-source software solutions. These software solutions were evaluated based on their advantages and disadvantages towards the goals, for example their ability to notify and send alarms such as e-mail notification or SMS (Short Message Service). A physical network was designed with three computers connected to a switch which was connected the local network and each computer served as a server for the listed software solutions. Installation and implementation was made in a Linux operating system. Comparing and contrasting each tool helped to analyze and choose the better management tool to meet the demands of the project. The end results of the project show that the availability of network devices and services was monitored and the system could generate notification alarms in the form of e-mail alarms in case of network failure. Keywords network management, open source, laboratory, Linux, SMS 1 Contents 1 Introduction 3 2 Theoretical Background 5 2.1 Network Management 5 2.2 Network Management Operations 5 2.2.1 Fault Management 5 2.2.2 Performance Management 6 2.2.3 Security Management 7 2.2.4 Configuration Management 7 2.2.5 Accounting Management 8 2.3 Network Management System Architecture 9 2.4 Network Management Protocol 3 2.4.1 Simple Network Management Protocol (SNMP) 10 Methodology 13 3.1 Requirements 13 3.2 Materials Used 13 3.3 Network Topology and Addressing Table 14 3.4 Software 16 3.4.1 Zabbix 16 3.4.2 Cacti 20 3.4.3 Nagios 22 3.5 4 10 Software Selection 25 Details of Nagios Implementation 26 4.1 Architecture and Setup 26 4.2 Soft and Hard States of Nagios 28 4.3 Nagios Configuration Files 29 4.4 Nagios Plugins 30 4.5 Writing New Plugins 31 5 Results and Discussion 34 6 Conclusion 35 References 36 2 1 Introduction Network technicians and programmers are working hard to come up with new technologies and innovations to make everyday life easy and simple. Currently every device is networked from simple to more complex company devices and it seems easy. Everything is working smoothly without problems for customers. However, service providers are investing large amounts of money and time to make sure that everything is working fine. The problem is how to manage and make sure that every networked device is working smoothly without affecting the daily life of customers. [2, 15-19] Metropolia UAS Communications and Network Engineering Laboratory, Leppävaara campus is running many physical and virtual network devices such as servers, network switches, routers, personal computers and other network devices. They found it difficult to monitor the status of all these devices through a physical check-up which is both a tedious and time-consuming job. Therefore the department decided to have a system which would monitor the status of these devices through an effective, simple and costeffective monitoring solution. Network technicians and programmers have developed network management tools to assess the status of each device and to report on the results before clients notice the problem, so that it is easy to take action on the affected device. These management tools can be free, partially free or with payment. [2] The main goal of the project is to implement a simple and cost-effective system which would monitor the status of network devices and services at the network laboratory. The system which is going to be implemented should be: Easy to install and simple to use Easy for maintenance and upgrading in case needed Compatible Able to send some notifications in case of failure Open source or partially open source 3 To meet the above demands I went through a selection of various monitoring tools available and finally settle with three open-source network monitoring tools. Details of these tools are described in the following chapters. Initially, the three tools were installed on three personal computers in the laboratory room with one switch connected to the school internet network. Using these tools, it was possible to monitor the status of sample devices to compare and contrast the benefits and drawbacks of the software against the objectives stated above. Based on the evaluation with respect to my objectives, I selected the better one. 4 2 Theoretical Background 2.1 Network Management Network Management is a way of managing and maintaining network operations and responding to changes in the network according to the user requirements. [3] In today’s network environment, it is challenging and time-consuming to manage a large amount of network devices and services by checking manually. Rather, it is wise to implement a Simple Network Management Protocol (SNMP) to monitor these devices. Implementing simple monitoring system helps to track the following: Availability and reachability of network devices and services Performance of network devices and services Keeping history of network operations Keeping an audit trace of changes Easy to schedule time of upgrading Therefore the main aim of network management is to identify network problems. This helps to act and fix before end-users identify the problem. The second important point is that the system allows one to view the trends of network devices and services across the network. Furthermore implementing a good network management system helps to track the usage of resources, for example CPU usage, temperature level of the server room and disk usage, and implementing some alarm mechanism, which helps notify in case of faults or the performance of network device or service is beyond the defined limit by the user.[3] 2.2 2.2.1 Network Management Operations Fault Management Fault Management is one of the main components of network management. Most of the time identifying and analysing the cause of the faults in a network is more timeconsuming than solving the problem. Hence a properly implemented network 5 management system can keep the network running at an optimum level.[5] Faults in a network can be caused by different reasons, which include software and hardware faults. The main functions of fault management include: Remote monitoring of network devices and services from a single location Monitoring the status devices and services constantly at a certain interval Setting threshold limits for potential failures Implementing alarming mechanisms which notify network administrators. Tracing locations where the failure occurs which eases to identify and fix the affected device or service Taking necessary measures In general, Fault Management can be divided into two parts, passive fault management and active fault management. Active fault management addresses problems by monitoring devices and services using monitoring tools and checks if the network device is available and responding. Passive fault management deals with notifications mechanisms; it sends alarms when a device or a service encountered problems. [6] 2.2.2 Performance Management Performance Management is one of the high-level management operations. Its main tasks are monitoring and controlling the network performance; this includes gathering statistical data from the network traffic, investigating and analysing the log history of the network and analysing the trends of the network whether positive or negative and evaluating it based on the data provided. [2] 6 To measure the performance of a network, it is important to analyse the collected records and go through the trends of the data. Proper analysing of performance records helps the network administrator: Figure out different mechanism enhancing the performance of a network in the future Forecast the threshold levels Set appropriate threshold levels so that if the level exceeds these limits, it will generate an alarm which indicates some attention is needed. There are different types of network monitoring tools which monitor the performance of a network device such as network traffic flow, bandwidth, speed, and media capacity. These parameters are presented either graphically, percentage format or in the form of other techniques, so that the network manager analyses these trends and evaluates them against the defined threshold levels and takes necessary action. [5; 6] 2.2.3 Security Management Security Management is the core of the Network Management Operation (NMO). It is responsible for securing the flow of the network traffic and making sure that it is flowing smoothly and prevented from outside intrusions. [4] A network without proper security is something like a country without defence forces. There should be some forces which detect and prevent the security of the nation both from inside and outside the country. [4; 5] The same is true with the network security management detects and prevents intrusions to the network. Security Management is concerned with the accessing rights to network devices and services. It is responsible for protecting user information gathered from the network devices. It is very important to consider the security issues during monitoring network devices and services. For example only authorized personnel should be allowed to access the status of the devices and the detailed information of the nodes and services, so that it is possible to protect them from external damage. The other advantage of implementing a strong security measure in a network environment avoids non-authorized personnel from making changes to configurations and other related changes to services and devices. 7 2.2.4 Configuration Management There are different network devices in a large networked environment and these devices can be configured to perform different applications. For example a personal computer can be configured to perform as a server or the same device can be configured to serve as a switch or as a router or both. If the device is decided to perform a certain application, it will be the configuration manager who chooses what kind of software is required and sets values for it. Configuration Management can be described as an important function inside network management which monitors networks and system information in which the effect of network operations can be traced and managed. [5] The configuration manager can start up and close the network or part of the network. Configuration management system stores information about a network device on which different application software is installed with different versions such as SNMP (Simple Network Management Protocol) version 3.1, TCP/ IP (Transmission Control Protocol/ Internet Protocol) software version 2.0. This information is stored in a database and can be easily accessed when some problems occur to the device and will help solve the problem. 2.2.5 Accounting Management Accounting Management is another important network management system which tracks usage of resources and customers can be charged according to their usage. [5, 6] The accounting management system is also called the allocation level because it distributes resources optimally and fairly among users. [7] Accounting Management is mostly related to the billing of users for their network usage, for example monitoring a server by a group of users and charging them for their use of resources. Accounting management helps to gather data about network utilization; this can be done by gathering traffic counters of switches and routers. [6, 7] Billing of users can be performed according to: The total number of transactions; this includes the number of times of login to a computer/server, emails sent and other login sessions 8 The total number of packets; the charges for this may vary with the size of the packets; large packets are charged less than small packets The total bytes; in this case users are billed for receiving packets, this billing system has its own drawbacks, i.e. users are billed for receiving acknowledgement packets. [7] 2.3 Network Management System Architecture A Network Management System is designed to show the hierarchy and relationships between the managed devices and the management entity within the network. [7] The following figure 1 shows a network management system architecture which helps to view the whole network as a unified architecture. Figure 1: Network management system architecture [7] The components listed in the figure above are explained as follows: Managed devices: end-devices such as personal computers, switches, routers and other network devices. These devices are supported by a network management software which enables them to send alerts to the 9 management entity in case of problems; for example if a control centre sets 20% packet loss thresholds for a network device and if the packet loss is more than this limit, it will send alerts to the management entity. If the management entity which is programmed to execute different actions such as notification, receives the alert from the device, it will take some actions to fix the problem. Management entity: programmed to respond for alerts issued by network devices and take actions according to predefined settings. Agents: software modules which collect network management data and are stored in the management database and send them to the network management entity within the network. These management data are sent using the network management protocol such as SNMP, Common Management Information Protocol (CMIP), to the management entity. Proxies: entities which collect information and send them to the management entities on behalf of other entities. [7; 8] 2.4 Network Management Protocol Network Management protocols are used to send management data between targeted network devices and management console. [16] The most used network management protocol in this thesis is SNMP and it is discussed below. 2.4.1 Simple Network Management Protocol (SNMP) SNMP is a network protocol used to manage network devices such as work stations, servers, switches, routers and other devices which run network management software. A management station (manager) collects status information from network devices running on SNMP agent on a TCP/IP network. [8] To do management tasks, SNMP uses two other protocols: Structure of Management Information (SMI): defines the general rules for naming objects, object types and showing how to encode them. Management Information Base (MIB): creates a collection of named objects, types and relationships to each other in an entity to be managed. In other 10 words management on the internet is done through the cooperation of three protocols: SNMP, SMI, and MIB.SNMP uses the services of UDP. SNMP uses services of UDP (User Datagram Protocol) on two well-known ports, 161 and 162. Port 161 is used by the agent and port 162 is used by the client (Manager). [17] There are two ways of communication between the manager and agent. The first way is when the manager sends a request (get requests) to retrieve information from the agent and a set request is sent out to change some values. The second way is when an agent wants to notify the manager about faults and on SNMP trap is sent out in such cases. [1, 224-225] The following figure 2 shows an illustration about communication types of SNMP. Figure 2: communication types of SNMP. [1] SNMP as its name indicates it is simple but powerful, which helps to manage the network by: 11 Gathering performance information about target devices for example bandwidth Sending alarms in case of failure to network devices Monitoring the status of critical services such as memory use, CPU load, etc. Performing active polling by asking devices at certain intervals. Having read and write access to network devices so that it is possible to switch on/off a single port in a switch Monitoring air temperature inside data centre (server room) There are three versions of SNMP; these are SNMPv1, SNMPv2 and SNMPv3. SNMP version 1 and version 2 are similar. SNMP version 1 is the initial implementation of SNMP protocol which operates on various protocols such as User Datagram Protocol (UDP), IP, AppleTalk Datagram-Delivery Protocol (DDP) and Novel Internet Packet Exchange (IPX). SNMP v1 and V2 are almost similar. However SNMP v2 adds and enhances some protocol operations. Their difference is that version 1 does not support 64-bit counters, which means it doesn’t provide security. Version 3 provides strong security; however it is more complex to setup. The reason for the usage of SNMP in this thesis is that it provides information about the target device and services such as CPU load, memory usage, bandwidth information and others. SNMP is very important in querying information when it comes to hardware-specific components such as switches and routers. [1, 177-178] 12 3 Methodology The main goal of the thesis was to implement a simple network monitoring system, which would not only ensure the availability of network devices and services but would also send notification alarms in case of failure. This was done by designing a physical network at Metropolia UAS Communications and Network Engineering Laboratory, Leppavaara campus 3.1 Requirements Certain requirements were set by the instructor and the production manager before starting the implementation process of a simple monitoring system. The main goal of the thesis was to meet the requirements of the system and they are listed below as follows: Implements a simple monitoring system which monitors availability of critical services and infrastructure components. Is easy to use and install Is able to send a notification alarm (Email / SMS) in case of failure to network devices and services in the network. Is cost-effective; an open source tool is preferable Can easily expand if there is a need to add new target devices to be monitored Provides maintenance documentation in the end. To meet the above requirements it was necessary to identify what kind of tools were needed, what materials were used to start the lab work and what the target devices and services to be monitored were. 13 3.2 Materials Used A physical network was built in the laboratory room with a Switch; three computers with 2GB RAM which served as servers for testing three different monitoring tools, a simple Arduino weather web server and an internet network. The switch was Cisco catalyst 2960 series which helps distributing the internal network. The three DELL computers are installed with Ubuntu 14.04.1 Long Term Support (LTS) version of Ubuntu server. The simple Arduino weather web server was constructed by one Metropolia student and it has DHT11 humidity and temperature sensor. The Arduino web server was delivered to me; however it was not properly documented, I had to search different libraries which supported the sensor type. More about this device will be explained in chapter 4, section 4.5. 3.3 Network Topology and Addressing Table This section describes network topology in the Laboratory room and addressing table. Figure 3: Physical network topology in the network laboratory room 14 As can be seen in figure 3 above the Cisco 2960 series switch is connected to the local Metropolia network. The three computers and the Arduino web server is connected to the switch and assigned an IP address. After assigning the IP addresses I was able to make sure that the devices are fully connected and the traffic was generated by pinging between each computer and other devices. Table 1 shows the addressing table of the physical and virtual network devices. These devices were the target devices whose status I was going to monitor. Table 1: Addressing table for target devices IP address Protocols Note x.x.x.x IP Ping(connectivity check-up) x.x.x.x IP >> x.x.x.x IP >> x.x.x.x IP >> x.x.x.x IP >> x.x.x.x IP >> x.x.x.x SNMP Version 1 (port 161) x.x.x.x SNMP Version 1 (port 161) x.x.x.x IP Ping(connectivity check-up) x.x.x.x UDP Check_ntp x.x.x.x UDP/TCP/IP Check_dns x.x.x.x IP Ping (connectivity check-up) x.x.x.x IP >> x.x.x.x IP >> x.x.x.x http Check_server_room_temperatur e 15 Many of the network devices listed in the table above are virtual devices except the Arduino weather web server (x.x.x.x) which is physical. As I explained in section 3.1, the monitoring system implemented does not only monitor the status of the above devices but also their services. For example a network device with an IP address of x.x.x.x is checked for both the status availability of the device as well as for the service Domain name server (DNS). 3.4 Software There are many network management tools available. Some are free of charge or partially free and others are not. One of the aims of this thesis was to implement a network monitoring system which is cost-effective, i.e. to implement a system using less expensive tools or if possible using open-source monitoring tools. To implement this kind of system, I had to go through a number of open-source network monitoring tools. These tools vary based on their functionality, configuration simplicity, easiness to implement and availability of plugins. Finally, I was able to shortlist and study three open-source network monitoring tools. These software alternatives were selected based on their simplicity to implement, easiness to configure, ability to send alarm notifications and my familiarity with some of them. These tools were installed on three computers, one on each computer. This gave a chance to analyse each software independently and choose a single tool which met our goals and objectives in a better way. 3.4.1 Zabbix Zabbix is a network monitoring solution that is designed to monitor the status and performance of network infrastructure components. Using Zabbix it is possible to collect different real time-data from the network. [9] Real-time monitoring network infrastructure means the status and services of physical and virtual network devices can be monitored and their information can be stored in a 16 database. This status information can be presented in maps or in graphs which helps network administrators to visualize the trend of the network traffic and set thresholds for the purpose of alerting. Zabbix is a free of charge monitoring tool which can be obtained the latest version by installing it from the distribution packages. The latest version during the installation was Zabbix version 2.2 on Ubuntu 14.04 LTS. Ubuntu was selected because it can provide resilience, fault tolerance and necessary performance. [9; 10] The free disk space to install Zabbix was quite enough because the number of devices monitored were not many. In my case the number of devices monitored were not more than 30. However as the number of monitored devices increases, the free hard disc space should also increase but the basic memory requirement is 256 MB free disk space. [10] Zabbix database requires significant CPU (Central Processing Unit) resources if the number of devices are increasing. All configuration definitions in Zabbix are stored in a database, but it is not possible to make any changes to the configurations, if there is a need, changes can be made using the web interface. The good features of Zabbix are its capabilities to present status information in graphs; however it not possible to get work histories and logs because Zabbix does not have time-stamped comments like Nagios. [9] Zabbix is well known for its good web interface which enables to visualize and compare the value of the devices it monitors. The system can be configured using the web interface once the basic installation is completed. Adding new hosts and services to Zabbix server is done using web interface, but it is harder to configure it because it takes more steps to build. 17 As can be seen in figure 4, once a Zabbix package is installed, new devices and services will be added using the web interface. Figure 4: Web interface for adding target devices. [9] Target devices are configured by filling the important fields as shown above. For example the host name can be the name of the target device and the visible name is the name which is visible on the lists or maps on the web interface, and the IP address is the address of the target device. These are the basic parameters. Once a host is added then adding an item will follow. Items collect data from the host/hosts. To do this 18 I needed to use an item key. For example an item with a key name net.dns [****. ****. ****. ****, metropolia.fi, MX, 2, 1] checks if DNS (Domain Name Server) service is UP or DOWN. A similar procedure was followed to add as many devices and services as possible. After successful completion of adding hosts and services, it was possible to watch the status of the devices on a web browser. Using Zabbix, it is possible to take a look at the graph of the items monitored how their trend looks like by clicking latest data, then the graph on the far right side of the monitored item on the web screen. For example figure 5 below shows how the system load was increased for the last three minutes to the highest level and an email notification was sent to notify about the problem. Figure 5: CPU load [10] So far I was able to manage creating target devices and services. The status of these devices and services were displayed web browser. It was also possible to look at the graph of the collected data, and by analysing this data I was able to set thresholds. Based on our goals, the system has to send an email/SMS notification in case of problems arose. Therefore Zabbix supports email notification by creating triggers. A trigger is an expression which automatically notices problems in a monitored item. [9] E-mail configuration in Zabbix is made like other configuration via web interface and it 19 is also possible to choose a media type (e-mail/SMS), in our case e-mail because we do not have enough facility to implement SMS. In Zabbix it is not possible to set scheduled maintenance for a specific period of time on specific hosts and services. To make scheduled maintenance the entire Zabbix server has to set offline or disabling the alarm system manually. [9; 11] 3.4.2 Cacti The second software which I went through in testing to implement a system was Cacti. Cacti is an open-source web-based network monitoring tool developed by Tobi Oeticker. It is a tool used to monitor network devices and services and stores and presents their statistics mainly in a graphical way and all the data are stored in MySQL database. [12] As stated in section 3.2, this software is also installed in Ubuntu 14.04 operating system and the Cacti version is 0.8.8b for Cacti server (computer). Before installing Cacti I had to check and install all dependencies: LAMP server which contains Apache 2.0, MySQL and PHP RRD Tool is a system which Cacti uses to create graphs for the device it monitors by storing data from the network device. SNMP and SNMPd; the latter is used to monitor the local host where Cacti is installed and it needs to be configured. 20 After installation of the Cacti software package was completed on the computer, the basic web interface of Cacti is displayed by writing the IP_address_of_the_cacti_server/cacti on any browser as can be seen in figure 6 below. Figure 6: Screeshot of Cacti dashboard [12] As can be shown in the figure above the target devices and services can be added or deleted using the web interface by clicking create devices and filling in the details of the device and save the information, now a new device is added. For example if I want to monitor a switch, I will have to add the detail information of it such as the IP address, description of the device, reachability options (PING, PING and SNMP, UDP, TCP), and 21 SNMP options (version 1, 2, 3). Once adding a device was completed, it was possible to see the newly generated graph for this device by clicking the “graph Management” on the left side of the navigation window. Using this monitoring tool I was able to analyse the following points by using the Cacti monitoring tool: Monitoring the availability of network devices and services Open source software and easy to setup Presenting the performance of a network device in a graphical way; this helps network administrators to analyse network traffic Does not support alerting mechanism such as E-mail/ SMS to notify when some fault occurs to the network device, but it is possible to manage this by integrating with the Nagios monitoring tool. Configuration interfaces is a little time-consuming; however configuration changes can be made easily. Upgrading versions can be complex. [13] 3.4.3 Nagios Nagios is open source software used to monitor availability of network devices and services. In simple terms Nagios is a fault monitoring software package which monitors network devices using plugins. These plugins help Nagios software to monitor a specific service such as HTTP, DNS, PING, SNMP, and HTTP. [1] Like the previous two software tools, Nagios was also installed in Ubuntu 14.04 LTS and the Nagios version installed was 3.5.1. However this was not the latest version but I found it easy to install and I thought it was enough for testing the software and it is possible to upgrade it during installation in virtual machine. Nagios was installed using an apt-get package repository. Before installing Nagios it is important to make sure that the LAMP server is installed. The LAMP contains Apache 2.0, MySQL and PHP, which enables the Apache web server on Linux. [11] The package contains all necessary dependencies including postfix, which is important for sending e-mail alerts. 22 Once Nagios 3.5.1 is installed, it is possible to access the Nagios web interface by writing the IP_address_of_ the server/nagios3 in a web browser. Figure 7 displays the basic web interface of Nagios. Figure 7: Screen shot of Nagios web interface As can be seen, the basic setup page shows the Nagios version. On the left side of the navigation window, it is possible to view the target devices and services by clicking 23 Hosts and services respectively and other information is also available. Figure 7 is a basic setup i.e. target devices and services are not yet added to be monitored. To add and monitor hosts and services, Nagios plugins need to be installed as Nagios cannot monitor network devices and services by itself, needs some kind of programmes called plugins. One of the good features of Nagios is the availability of a large number of plugins. Plugins are compiled scripts written in different programming languages (C, C+ +, PHP, and Perl.) and executed by Nagios whenever there is a need to check the status of a network device or a service. [2] Adding new hosts and services to the Nagios server is different than the two tools stated above. It is impossible to configure hosts and services using the Nagios web interface. This is one of the drawbacks of Nagios unless it is integrated with other tools which support web configuration. Hosts and services are added by writing host and service definitions in a text editor and saving these files in an object definition file. [1; 2] In my case they were saved in /etc/nagios3/conf.d/Target_devices.cfg and /etc/nagios3/conf.d/Nagios-services.cfg. These files must include information such as the IP address of the host, host name, notification option and other information. Once the hosts and services are added to the Nagios server, hosts and services are automatically displayed on the Nagios web server by restarting the server to take effect. Finally the Nagios web server looks like figure 8 below. Figure 8: Status information of Nagios hosts and services 24 From Figure 8 we see the status of network services. Normally Nagios presents the status of network devices and services in four states, ok, warning, critical and unknown states. As can be seen above the disk space service for the local host is in a critical situation and action has to be taken to fix the problem. The cause of the problem can be seen by clicking the service itself and detailed information about the failure will be displayed. The other very important feature of Nagios, which is one of the main goals of this thesis, is its ability to send alarms as e-mail notifications. Nagios can send e-mail notifications in case of failure occur to network devices and services. The notification can be done by configuring network devices and services when and in which cases to notify. For example it is possible to set notification alerts for a device or service when it is in a critical state as shown in figure 8, warning state, and un-known state and ok (recovery) state based on the threshold limited set by the network administrator. Generally I found Nagios met almost all of the goals and objectives of my thesis with respect to the following points: Open source software monitoring tool, i.e. free of charge. Able to monitor the status of network devices and services Able to send notification alarms using e-mails Does not provide graphs which shows trends of network traffic; however this can be performed by integrating with other tools such as cacti Network administrators can store comments with time stamps Availability of a large number of plugins. 3.5 Software Selection Each network monitoring solution has its own strengths and weaknesses. The three software solutions tested above provide many network monitoring features; in fact there are many additional features which are not the main goals of the thesis. Both Zabbix and Cacti have a nice web interface that can present performance graphing and reporting but they are not flexible and take more time to configure on the web interface. [8, 10, 11] Nagios is flexible and easy to configure and creates new hosts and services through shell script and text-based configuration files. In this regard I preferred Nagios 25 and I spent more time on studying Nagios than others, and anticipated that implementing the system with Nagios would require less effort. My emphasis was on evaluating interims of the goals and the objectives we set at the beginning of the thesis. The three software tools are open source and can monitor the status of network devices and services. Zabbix and Nagios are capable of sending email notification whereas Cacti cannot. Maintenance and upgrading of the system is easier in Nagios than others. The other good feature of Nagios is rich in the number of plugins available and third party plugins are also easy to implement. [1] Therefore, because of the above stated reason I found Nagios to meet my goals and objective, so I chose to implement the monitoring system using Nagios. 4 4.1 Details of Nagios Implementation Architecture and Setup Nagios does not monitor and report problems existing on the device by itself. Rather it uses plugins which return status information to Nagios. [2] The Objects monitored by Nagios can be divided into two categories, hosts and services. Hosts are physical machines or virtual machines such as servers, routers, switches; workstations and other network devices whereas services are particular functionalities that can be defined as a service to be monitored, for example SNMP process services, HTTP, DNS and NTP services. Both hosts and services can be grouped in to host and service groups. [2] 26 It is very important to understand how Nagios works and the architectural design of Nagios. The figure 9 below shows clearly how Nagios runs and works based on a client/server model. The Nagios server runs in a host and plugins run on a server and all other remote hosts are monitored. As can be seen from figure 9 the plugins send information to the server and the server in turn displays them in the GUI. [1; 14] Figure 9: Architectural design of Nagios [15] Usually Nagios runs as a daemon and periodically runs plugins residing on the server. These plugins make a contact with the hosts and services in the network and send information to Nagios and then the information sent is shown on the Nagios web interface. [1, 2] 27 As can be seen above, Nagios has three important parts: the Scheduler, the GUI and plugins. The scheduler (Nagios server) is a server part of Nagios that checks the plugins every time at certain intervals and takes some remedies based on the results from the plugins. The GUI is the web-interface of Nagios generated by Graphic User Interface and displays the status information of each host and services under the monitor. It displays the status information as ok/warning/critical/unknown. Plugins are programs that are configurable by the user. They can be programs written by the user or installed with the Nagios as a package. The main purpose of plugins is to check services and hosts and return the result to the Nagios server. [14] 4.2 Soft and Hard states of Nagios A state in which Nagios does not yet determine, if the status of the device or service is real or not, is called a soft state. A host or a service stays in a soft state until the maximum attempt is reached. Nagios checks the status of a host or a service at certain interval of time. Figure 10 describes well the soft and hard state of Nagios. [15] Figure 10: Screen shot of status information Therefore, in order to avoid false alarms Nagios allows defining how many times a host or a service has to be rechecked before the real status is determined. [1] For example, Nagios checks the disc space service for a maximum trial of four times; at this moment if the service keeps its critical state for the fourth time then we consider it a hard state. When the status is in a critical hard state then Nagios will send an e-mail notification. From figure 10 we can understand that a critical soft state is a state at which Nagios first detects the non-ok state of the host or service and then Nagios continues the second attempt. If the state still continues at its critical-soft state, it will go on like this until it reaches the maximum check attempts (4) and at this point it changes the state to 28 a critical hard state. The critical hard state is the final and real state of the device and event handlers execute and a notification is sent out. Then the check number is reset to 1 immediately. [1; 15] 4.3 Nagios Configuration Files During Nagios installation Nagios configuration files are placed in /etc/nagios3 by default. As shown in figure 7 Nagios has different configuration files, and some need to be edited or created. Figure 7: Nagios configuration files [14] The roles of the files are explained as follows: Main configuration file: This is the most important file which contains a number of directives which affect the operation of the Nagios daemon. This can be read by both the CGI and the Nagios Daemon. Nagios starts its operation by looking at this file first. 29 Common Gateway Interface file (CGI.cfg): This file contains directives which affect the operation of CGI. It is mainly used to monitor the web interface. It also contains a reference to the main configuration file. It knows the location of the object definitions and how their status is and how Nagios is configured. Resource file: This file is mainly used to store some sensitive information such as passwords and user defined macros and prevents the CGI from accessing this sensitive information. Object definition files: These are files where all host and service definitions are stored. Object definitions may include hosts, services, host groups, service groups, time periods, commands, contacts, contact groups. [1] 4.4 Nagios Plugins Nagios cannot monitor network devices and services by itself. It needs some kind of programmes called plugins. Plugins are compiled scripts written in different programming languages such as Shell, Perl and Python, and executed by Nagios whenever there is a need to check the status of a network device or a service. Plugins act as an abstraction layer between the Nagios daemon and the monitored objects i.e. it is a link between Nagios and the hosts or services. Nagios does not have any idea about what is really being monitored. It is the plugin that knows what service or device is to be checked and how it is going to be checked. Nagios only gets the status information of these devices or services through plugins. [2, 14] Plugins do not come together with the Nagios package; they need to be installed separately. There are more than 3000 Nagios plugins developed by the Nagios community team. However there are 50 official Nagios plugins which are developed by the official Nagios development team and they are free of charge. It is also possible to write one’s plugins when there is a need. [2, 3] In this thesis I have used both plugins developed by the official Nagios team and own developed plugins. Some of the official Nagios developed plugins I used in this thesis were check_http, check_snmp, check_ping, check_ntp, and check_dns. I have also written my own plugins which monitor the status of the server room temperature. 30 Plugins are installed and stored by default at /usr/lib/nagios/plugins directory, but some distributions install them in different locations. Plugins can be installed either directly from the Nagios website or from http://nagiosplug-sourceforge.net 4.5 Writing New plugins One of the critical services that needed to be monitored in this thesis was monitoring the temperature level of the server room. The system needs to monitor the level of the room temperature so that if the temperature level of the server room exceeds the threshold limits, the system has to notify the network administrator to take some action. Figure 11: Arduino-based Ethernet web server The first task was to acquire an IP-based device which would read the temperature of the room. The school had to buy a device for this purpose a couple of times; however none of the devices were successful because of the permission rights to read live data 31 from their server. Finally, I decided to use an Arduino-based temperature reader with a DHT11 sensor, built by a Metropolia student as shown in Figure 11 above. The device has an Ethernet shield which helps to display the temperature and humidity of the room in a web browser. The device was delivered to me without a manual other than a small report about the device. The report shows what type of sensor was used and some description about the sensor, but it did not show what libraries were used, what programs were needed to read from the sensor and display in the web browser. I had to assign an IP address to the device and download the Arduino software IDE version 1.06 and a written C++ programme from the internet. [15] The written C++ programme was modified to suit my purpose and it read the temperature and humidity values from the sensor and presented them in a web browser. Some important libraries such as dht11.h and wire.h were also downloaded from the internet. [15] The temperature and humidity values can be read from the web browser by writing the IP address of the device on any web browser. First the programme has to be compiled from the desktop computer and load it to the device and run it. I was able to read the temperature and humidity values from the web browser. The main task was to monitor this device via Nagios by setting threshold values. If the temperature reading exceeds these values, Nagios has to send an E-mail notification to the network administrator. To make this happen, I had to find Nagios plugins which could read real time values from the web browser and notify if the reading was more than the specified limits. Unfortunately I could not find any ready-made Nagios plugins which can do this task. I finally decided to write my own plugins using Python which could read these values. 32 I chose Python because it is easier to write and I have better knowledge of it than of other languages; however my friend also helped me in troubleshooting the codes. I tested the plugin and it worked fine. Figure 12: Service state information of Arduino temperature sensor As can be seen in figure 12 above Nagios successfully monitored the status of the Arduino temperature sensor and the system was able to notify via email in case of the readings exceeding the limit. 33 5 Results and Discussions The three monitoring solutions were able to monitor the status of the network devices and services. Additionally some of these solutions, particularly Zabbix and Cacti could present the performance of network services such as bandwidth, disk space and current load graphically, even though these services were not the main goals. Implementation of the project went successfully and I was able to implement a system which monitored the status of network devices and was able to report via e-mail when any problems were encountered to network devices and services. Figure 13 shows a problem and a recovery e-mail notification sent to network administrator about the network service. 34 Figure 13: Screenshot of an e-mail notification The network solution selected for implementation of this project was Nagios. Even though it was the best option to fulfil our goals and objectives, I would recommend installing Cacti along with Nagios for a better graphical presentation of critical network services. The other problem encountered during the project was availability of an IPbased network device. We changed the device two times and it was time-consuming to get a new device. Notification mechanism in this thesis was via e-mail. However I would suggest including other mechanisms such as SMS. When there is no internet connection, there is no e-mail notification sent to the network admin. 6 Conclusion The main goal of the thesis is to implement a simple network monitoring system, which would ensure the availability of network devices and services at the Metropolia network laboratory, Leppavaara campus. The system needed to be simple, cost-effective and compatible to be implemented for critical lab services and infrastructure components. To implement a system which would meet the goals and objectives, three software solutions, Nagios, Zabbix and Cacti were selected, implemented and evaluated with respect to their advantages and disadvantages. The best option which could easily meet the goals and objectives of the thesis was Nagios. Using the Nagios monitoring tool I was able to implement a system with the following key features: Open source software Easily expandable to new targets Able to monitor the status network devices and services Able to send notification e-mails to network admins in case of a fault happens on network devices and service Easy to upgrade and maintain In general, implementing this kind of system helps to improve the quality of service by verifying the health status of a network device and service at a certain interval. Another benefit is that network admins can easily identify the source of network faults and take action before it affects clients (end users). 35 References 1. Wojciech K. Learning Nagios 3.0. Birmingham, B27 6PA, UK: Packet Publishing Ltd; 2008. 2. Wolfgang Barth. Nagios. San Francisco, CA 94107: No Starch Press, Inc.; 2010. 3. Esad S. and Ivan I. Network Monitoring and Management Recommendations [online]. Serbia: AMRES led working group; February 2011.URL: http://services.geant.net/cbp/Knowledge_Base/Network_Monitoring/Documents/ gn3-na3-t4-abpd101.pdf. Accessed April 17 2015. 4. Cisco. Network Management System: Best Practices White Paper [online]. San Jose, CA: Cisco; June 2007. URL:http://www.cisco.com/c/en/us/support/docs/availability/highavailability/15114-NMS-bestpractice.html. Accessed March 24 2015. 5. Dr. Foroughi. NETWORK MANAGEMENT [online]. University of Southern Indiana: Indiana, USA; spring 2014. URL: http://www.usi.edu/business/aforough/Chapter%2020.pdf. Accessed April 19 2015. 6. Aiko Pras. Network Management Architectures [online].Hengelo, The Nether lands: University of Twente; 1995. URL:http://www.hit.bme.hu/~jakab/edu/litr/TMN/Network_Management_Architec tures_extr.pdf. Accessed February 2015. 36 7. Cisco. Network Management Basics [online]. SanJose, CA: Cisco Systems Inc.; October2012.URL: http://docwiki.cisco.com/wiki/Network_Management_Basics. Accessed April 15 2015. 8. Justin Elingwood. An Introduction to SNMP [online]. ShareAlike 4.0 International: DigitalOceanInc. 2015. URL:https://www.digitalocean.com/community/tutorials/an-introduction-to-snmpsimple-network-management-protocol. Accessed March 2015. 9. Rihards Olups. Abbix 1.8 Network Monitoring. Birmingham, B27 6PA, UK: Packet Publishing Ltd.; 2010. 10. Zabbix SIA. Zabbix Documentation [online]. Birmingham: Share Alike 3.0; 2014.URL: https://www.zabbix.com/documentation/2.2/start. December 2014. 11. Ed Simmonds and Jason H. Evaluation of Nagios and Zabbix monitoring [online]. Fermilab; February 2015. URL:http://cd-docdb.fnal.gov/0032/003277/001/nagios_zabbix_evaluation.pdf. March 2015. 12. Ian B., Tony R., Larry A. The Cacti Manual. The Cacti Group; 2012. 13. Dinangkur K. and S.M.Ibrahim. Cacti 0.8 Network Monitoring. Sydney: Packet Publishing Ltd.; 2009. 14. Nagios Core. Nagios Core Documentation [online]. Nagios Enterprises, LLC; 2014.URL:http://nagios.sourceforge.net/docs/3_0/toc.html. Accessed November 2014. 15. Manoj Chauhan. Nagios Architecture [online]. DISQUS; January 2010.URL: http://www.onaxer.com/2010/01/24/nagios-architecture/. Accessed March 2015. 16. Cisco. Network Management System [online]. Packet Storm; November 2002.URL:http://dl.packetstormsecurity.net/defcon10/MoreInfo/NetworkManage mentSystem-BestPractices.pdf. Accessed April 2015. 37 17. Behrouz A. Data Communication and Networking. New York, America: McGrawHill Companies, Inc.; 2007.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement