Monitoring Network Availablity Using Nagios Mohammedadem Abdulkadir Information Technology

Monitoring Network Availablity Using Nagios Mohammedadem  Abdulkadir Information Technology
Mohammedadem Abdulkadir
Monitoring Network Availablity Using Nagios
Helsinki Metropolia University of Applied Sciences
Bachelor of Engineering
Information Technology
Thesis
5 May 2015
Abstract
Author(s)
Title
Mohammedadem Abdulkadir
Monitoring network availability using Nagios
Number of Pages
Date
35 pages
5 May 2015
Degree
Bachelor of Engineering
Degree Programme
Information Technology
Specialisation option
Instructor(s)
Bruk Yirdaw, Project Manager
Matti Puska, M.Sc, Principal Lecturer
The main goal of the thesis was to implement a simple network monitoring system, which
would ensure the availability of network devices and services at Metropolia UAS
Communications and Network Engineering laboratory, Leppavaara campus. The system
needed to be easy to use, cost-effective and compatible to implement for critical lab
services and infrastructure components.
Currently there are many cost-effective network monitoring tools. However, this project
began by short-listing and comparing three open-source software solutions. These
software solutions were evaluated based on their advantages and disadvantages towards
the goals, for example their ability to notify and send alarms such as e-mail notification or
SMS (Short Message Service).
A physical network was designed with three computers connected to a switch which was
connected the local network and each computer served as a server for the listed software
solutions. Installation and implementation was made in a Linux operating system.
Comparing and contrasting each tool helped to analyze and choose the better
management tool to meet the demands of the project.
The end results of the project show that the availability of network devices and services
was monitored and the system could generate notification alarms in the form of e-mail
alarms in case of network failure.
Keywords
network management, open source, laboratory, Linux, SMS
1
Contents
1
Introduction
3
2
Theoretical Background
5
2.1
Network Management
5
2.2
Network Management Operations
5
2.2.1
Fault Management
5
2.2.2
Performance Management
6
2.2.3
Security Management
7
2.2.4
Configuration Management
7
2.2.5
Accounting Management
8
2.3
Network Management System Architecture
9
2.4
Network Management Protocol
3
2.4.1 Simple Network Management Protocol (SNMP)
10
Methodology
13
3.1
Requirements
13
3.2
Materials Used
13
3.3
Network Topology and Addressing Table
14
3.4
Software
16
3.4.1
Zabbix
16
3.4.2
Cacti
20
3.4.3
Nagios
22
3.5
4
10
Software Selection
25
Details of Nagios Implementation
26
4.1
Architecture and Setup
26
4.2
Soft and Hard States of Nagios
28
4.3
Nagios Configuration Files
29
4.4
Nagios Plugins
30
4.5
Writing New Plugins
31
5
Results and Discussion
34
6
Conclusion
35
References
36
2
1
Introduction
Network technicians and programmers are working hard to come up with new
technologies and innovations to make everyday life easy and simple. Currently every
device is networked from simple to more complex company devices and it seems easy.
Everything is working smoothly without problems for customers. However, service
providers are investing large amounts of money and time to make sure that everything
is working fine. The problem is how to manage and make sure that every networked
device is working smoothly without affecting the daily life of customers. [2, 15-19]
Metropolia UAS Communications and Network Engineering Laboratory, Leppävaara
campus is running many physical and virtual network devices such as servers, network
switches, routers, personal computers and other network devices. They found it difficult
to monitor the status of all these devices through a physical check-up which is both a
tedious and time-consuming job. Therefore the department decided to have a system
which would monitor the status of these devices through an effective, simple and costeffective monitoring solution.
Network technicians and programmers have developed network management tools to
assess the status of each device and to report on the results before clients notice the
problem, so that it is easy to take action on the affected device. These management
tools can be free, partially free or with payment. [2]
The main goal of the project is to implement a simple and cost-effective system which
would monitor the status of network devices and services at the network laboratory.
The system which is going to be implemented should be:
 Easy to install and simple to use
 Easy for maintenance and upgrading in case needed
 Compatible
 Able to send some notifications in case of failure
 Open source or partially open source
3
To meet the above demands I went through a selection of various monitoring tools
available and finally settle with three open-source network monitoring tools. Details of
these tools are described in the following chapters. Initially, the three tools were
installed on three personal computers in the laboratory room with one switch connected
to the school internet network.
Using these tools, it was possible to monitor the status of sample devices to compare
and contrast the benefits and drawbacks of the software against the objectives stated
above. Based on the evaluation with respect to my objectives, I selected the better one.
4
2
Theoretical Background
2.1
Network Management
Network Management is a way of managing and maintaining network operations and
responding to changes in the network according to the user requirements. [3] In today’s
network environment, it is challenging and time-consuming to manage a large amount
of network devices and services by checking manually. Rather, it is wise to implement a
Simple Network Management Protocol (SNMP) to monitor these devices.
Implementing simple monitoring system helps to track the following:
 Availability and reachability of network devices and services
 Performance of network devices and services
 Keeping history of network operations
 Keeping an audit trace of changes
 Easy to schedule time of upgrading
Therefore the main aim of network management is to identify network problems. This
helps to act and fix before end-users identify the problem. The second important point
is that the system allows one to view the trends of network devices and services across
the network. Furthermore implementing a good network management system helps to
track the usage of resources, for example CPU usage, temperature level of the server
room and disk usage, and implementing some alarm mechanism, which helps notify in
case of faults or the performance of network device or service is beyond the defined
limit by the user.[3]
2.2
2.2.1
Network Management Operations
Fault Management
Fault Management is one of the main components of network management. Most of
the time identifying and analysing the cause of the faults in a network is more timeconsuming than solving the problem. Hence a properly implemented network
5
management system can keep the network running at an optimum level.[5] Faults in a
network can be caused by different reasons, which include software and hardware
faults.
The main functions of fault management include:

Remote monitoring of network devices and services from a single
location

Monitoring the status devices and services constantly at a certain
interval

Setting threshold limits for potential failures

Implementing
alarming
mechanisms
which
notify
network
administrators.

Tracing locations where the failure occurs which eases to identify and
fix the affected device or service

Taking necessary measures
In general, Fault Management can be divided into two parts, passive fault
management and active fault management.

Active fault management addresses problems by monitoring devices and
services using monitoring tools and checks if the network device is available
and responding.

Passive fault management deals with notifications mechanisms; it sends alarms
when a device or a service encountered problems. [6]
2.2.2
Performance Management
Performance Management is one of the high-level management operations. Its main
tasks are monitoring and controlling the network performance; this includes gathering
statistical data from the network traffic, investigating and analysing the log history of the
network and analysing the trends of the network whether positive or negative and
evaluating it based on the data provided. [2]
6
To measure the performance of a network, it is important to analyse the collected
records and go through the trends of the data. Proper analysing of performance
records helps the network administrator:

Figure out different mechanism enhancing the performance of a
network in the future


Forecast the threshold levels
Set appropriate threshold levels so that if the level exceeds these
limits, it will generate an alarm which indicates some attention is
needed.
There are different types of network monitoring tools which monitor the performance of
a network device such as network traffic flow, bandwidth, speed, and media capacity.
These parameters are presented either graphically, percentage format or in the form of
other techniques, so that the network manager analyses these trends and evaluates
them against the defined threshold levels and takes necessary action. [5; 6]
2.2.3
Security Management
Security Management is the core of the Network Management Operation (NMO). It is
responsible for securing the flow of the network traffic and making sure that it is flowing
smoothly and prevented from outside intrusions. [4]
A network without proper security is something like a country without defence forces.
There should be some forces which detect and prevent the security of the nation both
from inside and outside the country. [4; 5] The same is true with the network security
management detects and prevents intrusions to the network.
Security Management is concerned with the accessing rights to network devices and
services. It is responsible for protecting user information gathered from the network
devices. It is very important to consider the security issues during monitoring network
devices and services. For example only authorized personnel should be allowed to
access the status of the devices and the detailed information of the nodes and
services, so that it is possible to protect them from external damage. The other
advantage of implementing a strong security measure in a network environment avoids
non-authorized personnel from making changes to configurations and other related
changes to services and devices.
7
2.2.4
Configuration Management
There are different network devices in a large networked environment and these
devices can be configured to perform different applications. For example a personal
computer can be configured to perform as a server or the same device can be
configured to serve as a switch or as a router or both. If the device is decided to
perform a certain application, it will be the configuration manager who chooses what
kind of software is required and sets values for it.
Configuration Management can be described as an important function inside network
management which monitors networks and system information in which the effect of
network operations can be traced and managed. [5] The configuration manager can
start up and close the network or part of the network. Configuration management
system stores information about a network device on which different application
software is installed with different versions such as SNMP (Simple Network
Management Protocol) version 3.1, TCP/ IP (Transmission Control Protocol/ Internet
Protocol) software version 2.0. This information is stored in a database and can be
easily accessed when some problems occur to the device and will help solve the
problem.
2.2.5
Accounting Management
Accounting Management is another important network management system which
tracks usage of resources and customers can be charged according to their usage. [5,
6] The accounting management system is also called the allocation level because it
distributes resources optimally and fairly among users. [7]
Accounting Management is mostly related to the billing of users for their network
usage, for example monitoring a server by a group of users and charging them for their
use of resources. Accounting management helps to gather data about network
utilization; this can be done by gathering traffic counters of switches and routers. [6, 7]
Billing of users can be performed according to:

The total number of transactions; this includes the number of times of login to a
computer/server, emails sent and other login sessions
8

The total number of packets; the charges for this may vary with the size of the
packets; large packets are charged less than small packets

The total bytes; in this case users are billed for receiving packets, this billing
system has its own drawbacks, i.e. users are billed for receiving
acknowledgement packets. [7]
2.3
Network Management System Architecture
A Network Management System is designed to show the hierarchy and relationships
between the managed devices and the management entity within the network. [7] The
following figure 1 shows a network management system architecture which helps to
view the whole network as a unified architecture.
Figure 1: Network management system architecture [7]
The components listed in the figure above are explained as follows:

Managed devices: end-devices such as personal computers, switches,
routers and other network devices. These devices are supported by a
network management software which enables them to send alerts to the
9
management entity in case of problems; for example if a control centre sets
20% packet loss thresholds for a network device and if the packet loss is
more than this limit, it will send alerts to the management entity. If the
management entity which is programmed to execute different actions such
as notification, receives the alert from the device, it will take some actions to
fix the problem.

Management entity: programmed to respond for alerts issued by network
devices and take actions according to predefined settings.

Agents: software modules which collect network management data and are
stored in the management database and send them to the network
management entity within the network. These management data are sent
using the network management protocol such as SNMP, Common
Management Information Protocol (CMIP), to the management entity.

Proxies: entities which collect information and send them to the
management entities on behalf of other entities. [7; 8]
2.4
Network Management Protocol
Network Management protocols are used to send management data between targeted
network devices and management console. [16] The most used network management
protocol in this thesis is SNMP and it is discussed below.
2.4.1 Simple Network Management Protocol (SNMP)
SNMP is a network protocol used to manage network devices such as work stations,
servers, switches, routers and other devices which run network management software.
A management station (manager) collects status information from network devices
running on SNMP agent on a TCP/IP network. [8]
To do management tasks, SNMP uses two other protocols:

Structure of Management Information (SMI): defines the general rules for
naming objects, object types and showing how to encode them.

Management Information Base (MIB): creates a collection of named objects,
types and relationships to each other in an entity to be managed. In other
10
words management on the internet is done through the cooperation of three
protocols: SNMP, SMI, and MIB.SNMP uses the services of UDP.
SNMP uses services of UDP (User Datagram Protocol) on two well-known ports,
161 and 162. Port 161 is used by the agent and port 162 is used by the client
(Manager). [17]
There are two ways of communication between the manager and agent. The first
way is when the manager sends a request (get requests) to retrieve information
from the agent and a set request is sent out to change some values. The second
way is when an agent wants to notify the manager about faults and on SNMP trap
is sent out in such cases. [1, 224-225]
The following figure 2 shows an illustration about communication types of SNMP.
Figure 2: communication types of SNMP. [1]
SNMP as its name indicates it is simple but powerful, which helps to manage the
network by:
11

Gathering performance information about target devices for example
bandwidth

Sending alarms in case of failure to network devices

Monitoring the status of critical services such as memory use, CPU load, etc.

Performing active polling by asking devices at certain intervals.

Having read and write access to network devices so that it is possible to
switch on/off a single port in a switch

Monitoring air temperature inside data centre (server room)
There are three versions of SNMP; these are SNMPv1, SNMPv2 and SNMPv3. SNMP
version 1 and version 2 are similar. SNMP version 1 is the initial implementation of
SNMP protocol which operates on various protocols such as User Datagram Protocol
(UDP), IP, AppleTalk Datagram-Delivery Protocol (DDP) and Novel Internet Packet
Exchange (IPX). SNMP v1 and V2 are almost similar. However SNMP v2 adds and
enhances some protocol operations. Their difference is that version 1 does not support
64-bit counters, which means it doesn’t provide security. Version 3 provides strong
security; however it is more complex to setup.
The reason for the usage of SNMP in this thesis is that it provides information about
the target device and services such as CPU load, memory usage, bandwidth
information and others. SNMP is very important in querying information when it comes
to hardware-specific components such as switches and routers. [1, 177-178]
12
3
Methodology
The main goal of the thesis was to implement a simple network monitoring system,
which would not only ensure the availability of network devices and services but would
also send notification alarms in case of failure. This was done by designing a physical
network at Metropolia UAS Communications and Network Engineering Laboratory,
Leppavaara campus
3.1
Requirements
Certain requirements were set by the instructor and the production manager before
starting the implementation process of a simple monitoring system. The main goal of
the thesis was to meet the requirements of the system and they are listed below as
follows:
 Implements a simple monitoring system which monitors availability of critical
services and infrastructure components.
 Is easy to use and install
 Is able to send a notification alarm (Email / SMS) in case of failure to network
devices and services in the network.
 Is cost-effective; an open source tool is preferable
 Can easily expand if there is a need to add new target devices to be monitored
 Provides maintenance documentation in the end.
To meet the above requirements it was necessary to identify what kind of tools were
needed, what materials were used to start the lab work and what the target devices
and services to be monitored were.
13
3.2
Materials Used
A physical network was built in the laboratory room with a Switch; three computers with
2GB RAM which served as servers for testing three different monitoring tools, a simple
Arduino weather web server and an internet network.
The switch was Cisco catalyst 2960 series which helps distributing the internal network.
The three DELL computers are installed with Ubuntu 14.04.1 Long Term Support (LTS)
version of Ubuntu server. The simple Arduino weather web server was constructed by
one Metropolia student and it has DHT11 humidity and temperature sensor. The
Arduino web server was delivered to me; however it was not properly documented, I
had to search different libraries which supported the sensor type. More about this
device will be explained in chapter 4, section 4.5.
3.3
Network Topology and Addressing Table
This section describes network topology in the Laboratory room and addressing table.
Figure 3: Physical network topology in the network laboratory room
14
As can be seen in figure 3 above the Cisco 2960 series switch is connected to the local
Metropolia network. The three computers and the Arduino web server is connected to
the switch and assigned an IP address. After assigning the IP addresses I was able to
make sure that the devices are fully connected and the traffic was generated by pinging
between each computer and other devices.
Table 1 shows the addressing table of the physical and virtual network devices. These
devices were the target devices whose status I was going to monitor.
Table 1: Addressing table for target devices
IP address
Protocols
Note
x.x.x.x
IP
Ping(connectivity check-up)
x.x.x.x
IP
>>
x.x.x.x
IP
>>
x.x.x.x
IP
>>
x.x.x.x
IP
>>
x.x.x.x
IP
>>
x.x.x.x
SNMP
Version 1 (port 161)
x.x.x.x
SNMP
Version 1 (port 161)
x.x.x.x
IP
Ping(connectivity check-up)
x.x.x.x
UDP
Check_ntp
x.x.x.x
UDP/TCP/IP
Check_dns
x.x.x.x
IP
Ping (connectivity check-up)
x.x.x.x
IP
>>
x.x.x.x
IP
>>
x.x.x.x
http
Check_server_room_temperatur
e
15
Many of the network devices listed in the table above are virtual devices except the
Arduino weather web server (x.x.x.x) which is physical. As I explained in section 3.1,
the monitoring system implemented does not only monitor the status of the above
devices but also their services. For example a network device with an IP address of
x.x.x.x is checked for both the status availability of the device as well as for the service
Domain name server (DNS).
3.4
Software
There are many network management tools available. Some are free of charge or
partially free and others are not. One of the aims of this thesis was to implement a
network monitoring system which is cost-effective, i.e. to implement a system using
less expensive tools or if possible using open-source monitoring tools.
To implement this kind of system, I had to go through a number of open-source network
monitoring tools. These tools vary based on their functionality, configuration simplicity,
easiness to implement and availability of plugins.
Finally, I was able to shortlist and study three open-source network monitoring tools.
These software alternatives were selected based on their simplicity to implement,
easiness to configure, ability to send alarm notifications and my familiarity with some of
them. These tools were installed on three computers, one on each computer. This gave
a chance to analyse each software independently and choose a single tool which met
our goals and objectives in a better way.
3.4.1
Zabbix
Zabbix is a network monitoring solution that is designed to monitor the status and
performance of network infrastructure components. Using Zabbix it is possible to
collect different real time-data from the network. [9]
Real-time monitoring network infrastructure means the status and services of physical
and virtual network devices can be monitored and their information can be stored in a
16
database. This status information can be presented in maps or in graphs which helps
network administrators to visualize the trend of the network traffic and set thresholds
for the purpose of alerting. Zabbix is a free of charge monitoring tool which can be
obtained the latest version by installing it from the distribution packages. The latest
version during the installation was Zabbix version 2.2 on Ubuntu 14.04 LTS. Ubuntu
was selected because it can provide resilience, fault tolerance and necessary
performance. [9; 10]
The free disk space to install Zabbix was quite enough because the number of devices
monitored were not many. In my case the number of devices monitored were not more
than 30. However as the number of monitored devices increases, the free hard disc
space should also increase but the basic memory requirement is 256 MB free disk
space. [10] Zabbix database requires significant CPU (Central Processing Unit)
resources if the number of devices are increasing. All configuration definitions in Zabbix
are stored in a database, but it is not possible to make any changes to the
configurations, if there is a need, changes can be made using the web interface.
The good features of Zabbix are its capabilities to present status information in graphs;
however it not possible to get work histories and logs because Zabbix does not have
time-stamped comments like Nagios. [9]
Zabbix is well known for its good web interface which enables to visualize and
compare the value of the devices it monitors. The system can be configured using the
web interface once the basic installation is completed. Adding new hosts and services
to Zabbix server is done using web interface, but it is harder to configure it because it
takes more steps to build.
17
As can be seen in figure 4, once a Zabbix package is installed, new devices and
services will be added using the web interface.
Figure 4: Web interface for adding target devices. [9]
Target devices are configured by filling the important fields as shown above. For
example the host name can be the name of the target device and the visible name is
the name which is visible on the lists or maps on the web interface, and the IP address
is the address of the target device. These are the basic parameters. Once a host is
added then adding an item will follow. Items collect data from the host/hosts. To do this
18
I needed to use an item key. For example an item with a key name net.dns [****. ****.
****. ****, metropolia.fi, MX, 2, 1] checks if DNS (Domain Name Server) service is UP or
DOWN. A similar procedure was followed to add as many devices and services as
possible. After successful completion of adding hosts and services, it was possible to
watch the status of the devices on a web browser.
Using Zabbix, it is possible to take a look at the graph of the items monitored how their
trend looks like by clicking latest data, then the graph on the far right side of the
monitored item on the web screen.
For example figure 5 below shows how the system load was increased for the last
three minutes to the highest level and an email notification was sent to notify about the
problem.
Figure 5: CPU load [10]
So far I was able to manage creating target devices and services. The status of these
devices and services were displayed web browser. It was also possible to look at the
graph of the collected data, and by analysing this data I was able to set thresholds.
Based on our goals, the system has to send an email/SMS notification in case of
problems arose. Therefore Zabbix supports email notification by creating triggers. A
trigger is an expression which automatically notices problems in a monitored item. [9]
E-mail configuration in Zabbix is made like other configuration via web interface and it
19
is also possible to choose a media type (e-mail/SMS), in our case e-mail because we
do not have enough facility to implement SMS.
In Zabbix it is not possible to set scheduled maintenance for a specific period of time
on specific hosts and services. To make scheduled maintenance the entire Zabbix
server has to set offline or disabling the alarm system manually. [9; 11]
3.4.2
Cacti
The second software which I went through in testing to implement a system was Cacti.
Cacti is an open-source web-based network monitoring tool developed by Tobi
Oeticker.
It is a tool used to monitor network devices and services and stores and presents their
statistics mainly in a graphical way and all the data are stored in MySQL database. [12]
As stated in section 3.2, this software is also installed in Ubuntu 14.04 operating
system and the Cacti version is 0.8.8b for Cacti server (computer). Before installing
Cacti I had to check and install all dependencies:

LAMP server which contains Apache 2.0, MySQL and PHP

RRD Tool is a system which Cacti uses to create graphs for the device it
monitors by storing data from the network device.

SNMP and SNMPd; the latter is used to monitor the local host where
Cacti is installed and it needs to be configured.
20
After installation of the Cacti software package was completed on the computer,
the
basic
web
interface
of
Cacti
is
displayed
by
writing
the
IP_address_of_the_cacti_server/cacti on any browser as can be seen in figure 6 below.
Figure 6: Screeshot of Cacti dashboard [12]
As can be shown in the figure above the target devices and services can be added or
deleted using the web interface by clicking create devices and filling in the details of the
device and save the information, now a new device is added. For example if I want to
monitor a switch, I will have to add the detail information of it such as the IP address,
description of the device, reachability options (PING, PING and SNMP, UDP, TCP), and
21
SNMP options (version 1, 2, 3). Once adding a device was completed, it was possible
to see the newly generated graph for this device by clicking the “graph Management”
on the left side of the navigation window.
Using this monitoring tool I was able to analyse the following points by using the Cacti
monitoring tool:

Monitoring the availability of network devices and services
 Open source software and easy to setup
 Presenting the performance of a network device in a graphical way; this
helps network administrators to analyse network traffic
 Does not support alerting mechanism such as E-mail/ SMS to notify when
some fault occurs to the network device, but it is possible to manage this
by integrating with the Nagios monitoring tool.
 Configuration interfaces is a little time-consuming; however configuration
changes can be made easily.
 Upgrading versions can be complex. [13]
3.4.3
Nagios
Nagios is open source software used to monitor availability of network devices and
services. In simple terms Nagios is a fault monitoring software package which monitors
network devices using plugins. These plugins help Nagios software to monitor a
specific service such as HTTP, DNS, PING, SNMP, and HTTP. [1]
Like the previous two software tools, Nagios was also installed in Ubuntu 14.04 LTS
and the Nagios version installed was 3.5.1. However this was not the latest version but
I found it easy to install and I thought it was enough for testing the software and it is
possible to upgrade it during installation in virtual machine.
Nagios was installed using an apt-get package repository. Before installing Nagios it is
important to make sure that the LAMP server is installed. The LAMP contains Apache
2.0, MySQL and PHP, which enables the Apache web server on Linux. [11] The
package contains all necessary dependencies including postfix, which is important for
sending e-mail alerts.
22
Once Nagios 3.5.1 is installed, it is possible to access the Nagios web interface by
writing the IP_address_of_ the server/nagios3 in a web browser. Figure 7 displays the
basic web interface of Nagios.
Figure 7: Screen shot of Nagios web interface
As can be seen, the basic setup page shows the Nagios version. On the left side of the
navigation window, it is possible to view the target devices and services by clicking
23
Hosts and services respectively and other information is also available. Figure 7 is a
basic setup i.e. target devices and services are not yet added to be monitored. To add
and monitor hosts and services, Nagios plugins need to be installed as Nagios cannot
monitor network devices and services by itself, needs some kind of programmes called
plugins. One of the good features of Nagios is the availability of a large number of
plugins. Plugins are compiled scripts written in different programming languages (C, C+
+, PHP, and Perl.) and executed by Nagios whenever there is a need to check the
status of a network device or a service. [2]
Adding new hosts and services to the Nagios server is different than the two tools
stated above. It is impossible to configure hosts and services using the Nagios web
interface. This is one of the drawbacks of Nagios unless it is integrated with other tools
which support web configuration. Hosts and services are added by writing host and
service definitions in a text editor and saving these files in an object definition file. [1; 2]
In
my
case
they
were
saved
in
/etc/nagios3/conf.d/Target_devices.cfg
and
/etc/nagios3/conf.d/Nagios-services.cfg. These files must include information such as
the IP address of the host, host name, notification option and other information.
Once the hosts and services are added to the Nagios server, hosts and services are
automatically displayed on the Nagios web server by restarting the server to take
effect. Finally the Nagios web server looks like figure 8 below.
Figure 8: Status information of Nagios hosts and services
24
From Figure 8 we see the status of network services. Normally Nagios presents the
status of network devices and services in four states, ok, warning, critical and unknown
states. As can be seen above the disk space service for the local host is in a critical
situation and action has to be taken to fix the problem. The cause of the problem can
be seen by clicking the service itself and detailed information about the failure will be
displayed.
The other very important feature of Nagios, which is one of the main goals of this
thesis, is its ability to send alarms as e-mail notifications. Nagios can send e-mail
notifications in case of failure occur to network devices and services.
The notification can be done by configuring network devices and services when and in
which cases to notify. For example it is possible to set notification alerts for a device or
service when it is in a critical state as shown in figure 8, warning state, and un-known
state and ok (recovery) state based on the threshold limited set by the network
administrator.
Generally I found Nagios met almost all of the goals and objectives of my thesis with
respect to the following points:
 Open source software monitoring tool, i.e. free of charge.
 Able to monitor the status of network devices and services
 Able to send notification alarms using e-mails
 Does not provide graphs which shows trends of network traffic; however this
can be performed by integrating with other tools such as cacti
 Network administrators can store comments with time stamps
 Availability of a large number of plugins.
3.5
Software Selection
Each network monitoring solution has its own strengths and weaknesses. The three
software solutions tested above provide many network monitoring features; in fact
there are many additional features which are not the main goals of the thesis. Both
Zabbix and Cacti have a nice web interface that can present performance graphing and
reporting but they are not flexible and take more time to configure on the web interface.
[8, 10, 11] Nagios is flexible and easy to configure and creates new hosts and services
through shell script and text-based configuration files. In this regard I preferred Nagios
25
and I spent more time on studying Nagios than others, and anticipated that
implementing the system with Nagios would require less effort.
My emphasis was on evaluating interims of the goals and the objectives we set at the
beginning of the thesis. The three software tools are open source and can monitor the
status of network devices and services. Zabbix and Nagios are capable of sending email notification whereas Cacti cannot. Maintenance and upgrading of the system is
easier in Nagios than others. The other good feature of Nagios is rich in the number of
plugins available and third party plugins are also easy to implement. [1]
Therefore, because of the above stated reason I found Nagios to meet my goals and
objective, so I chose to implement the monitoring system using Nagios.
4
4.1
Details of Nagios Implementation
Architecture and Setup
Nagios does not monitor and report problems existing on the device by itself. Rather it
uses plugins which return status information to Nagios. [2] The Objects monitored by
Nagios can be divided into two categories, hosts and services. Hosts are physical
machines or virtual machines such as servers, routers, switches; workstations and
other network devices whereas services are particular functionalities that can be
defined as a service to be monitored, for example SNMP process services, HTTP, DNS
and NTP services. Both hosts and services can be grouped in to host and service
groups. [2]
26
It is very important to understand how Nagios works and the architectural design of
Nagios. The figure 9 below shows clearly how Nagios runs and works based on a
client/server model. The Nagios server runs in a host and plugins run on a server and
all other remote hosts are monitored. As can be seen from figure 9 the plugins send
information to the server and the server in turn displays them in the GUI. [1; 14]
Figure 9: Architectural design of Nagios [15]
Usually Nagios runs as a daemon and periodically runs plugins residing on the server.
These plugins make a contact with the hosts and services in the network and send
information to Nagios and then the information sent is shown on the Nagios web
interface. [1, 2]
27
As can be seen above, Nagios has three important parts: the Scheduler, the GUI and
plugins.
 The scheduler (Nagios server) is a server part of Nagios that checks the plugins
every time at certain intervals and takes some remedies based on the results
from the plugins.
 The GUI is the web-interface of Nagios generated by Graphic User Interface
and displays the status information of each host and services under the monitor.
It displays the status information as ok/warning/critical/unknown.
 Plugins are programs that are configurable by the user. They can be programs
written by the user or installed with the Nagios as a package. The main purpose
of plugins is to check services and hosts and return the result to the Nagios
server. [14]
4.2
Soft and Hard states of Nagios
A state in which Nagios does not yet determine, if the status of the device or service is
real or not, is called a soft state. A host or a service stays in a soft state until the
maximum attempt is reached. Nagios checks the status of a host or a service at certain
interval of time. Figure 10 describes well the soft and hard state of Nagios. [15]
Figure 10: Screen shot of status information
Therefore, in order to avoid false alarms Nagios allows defining how many times a host
or a service has to be rechecked before the real status is determined. [1] For example,
Nagios checks the disc space service for a maximum trial of four times; at this moment
if the service keeps its critical state for the fourth time then we consider it a hard state.
When the status is in a critical hard state then Nagios will send an e-mail notification.
From figure 10 we can understand that a critical soft state is a state at which Nagios
first detects the non-ok state of the host or service and then Nagios continues the
second attempt. If the state still continues at its critical-soft state, it will go on like this
until it reaches the maximum check attempts (4) and at this point it changes the state to
28
a critical hard state. The critical hard state is the final and real state of the device and
event handlers execute and a notification is sent out. Then the check number is reset
to 1 immediately. [1; 15]
4.3
Nagios Configuration Files
During Nagios installation Nagios configuration files are placed in /etc/nagios3 by
default. As shown in figure 7 Nagios has different configuration files, and some need to
be edited or created.
Figure 7: Nagios configuration files [14]
The roles of the files are explained as follows:
 Main configuration file: This is the most important file which contains a
number of directives which affect the operation of the Nagios daemon.
This can be read by both the CGI and the Nagios Daemon. Nagios
starts its operation by looking at this file first.
29
 Common Gateway Interface file (CGI.cfg): This file contains directives
which affect the operation of CGI. It is mainly used to monitor the web
interface. It also contains a reference to the main configuration file. It
knows the location of the object definitions and how their status is and
how Nagios is configured.
 Resource file: This file is mainly used to store some sensitive
information such as passwords and user defined macros and prevents
the CGI from accessing this sensitive information.
 Object definition files: These are files where all host and service
definitions are stored. Object definitions may include hosts, services,
host groups, service groups, time periods, commands, contacts,
contact groups. [1]
4.4
Nagios Plugins
Nagios cannot monitor network devices and services by itself. It needs some kind of
programmes called plugins. Plugins are compiled scripts written in different
programming languages such as Shell, Perl and Python, and executed by Nagios
whenever there is a need to check the status of a network device or a service.
Plugins act as an abstraction layer between the Nagios daemon and the monitored
objects i.e. it is a link between Nagios and the hosts or services. Nagios does not have
any idea about what is really being monitored. It is the plugin that knows what service
or device is to be checked and how it is going to be checked. Nagios only gets the
status information of these devices or services through plugins. [2, 14]
Plugins do not come together with the Nagios package; they need to be installed
separately. There are more than 3000 Nagios plugins developed by the Nagios
community team. However there are 50 official Nagios plugins which are developed by
the official Nagios development team and they are free of charge. It is also possible to
write one’s plugins when there is a need. [2, 3] In this thesis I have used both plugins
developed by the official Nagios team and own developed plugins. Some of the official
Nagios developed plugins I used in this thesis were check_http, check_snmp,
check_ping, check_ntp, and check_dns. I have also written my own plugins which
monitor the status of the server room temperature.
30
Plugins are installed and stored by default at /usr/lib/nagios/plugins directory, but some
distributions install them in different locations. Plugins can be installed either directly
from the Nagios website or from http://nagiosplug-sourceforge.net
4.5
Writing New plugins
One of the critical services that needed to be monitored in this thesis was monitoring
the temperature level of the server room. The system needs to monitor the level of the
room temperature so that if the temperature level of the server room exceeds the
threshold limits, the system has to notify the network administrator to take some action.
Figure 11: Arduino-based Ethernet web server
The first task was to acquire an IP-based device which would read the temperature of
the room. The school had to buy a device for this purpose a couple of times; however
none of the devices were successful because of the permission rights to read live data
31
from their server. Finally, I decided to use an Arduino-based temperature reader with a
DHT11 sensor, built by a Metropolia student as shown in Figure 11 above.
The device has an Ethernet shield which helps to display the temperature and humidity
of the room in a web browser. The device was delivered to me without a manual other
than a small report about the device. The report shows what type of sensor was used
and some description about the sensor, but it did not show what libraries were used,
what programs were needed to read from the sensor and display in the web browser.
I had to assign an IP address to the device and download the Arduino software IDE
version 1.06 and a written C++ programme from the internet. [15] The written C++
programme was modified to suit my purpose and it read the temperature and humidity
values from the sensor and presented them in a web browser. Some important libraries
such as dht11.h and wire.h were also downloaded from the internet. [15] The
temperature and humidity values can be read from the web browser by writing the IP
address of the device on any web browser. First the programme has to be compiled
from the desktop computer and load it to the device and run it.
I was able to read the temperature and humidity values from the web browser. The
main task was to monitor this device via Nagios by setting threshold values. If the
temperature reading exceeds these values, Nagios has to send an E-mail notification
to the network administrator. To make this happen, I had to find Nagios plugins which
could read real time values from the web browser and notify if the reading was more
than the specified limits. Unfortunately I could not find any ready-made Nagios plugins
which can do this task. I finally decided to write my own plugins using Python which
could read these values.
32
I chose Python because it is easier to write and I have better knowledge of it than of
other languages; however my friend also helped me in troubleshooting the codes. I
tested the plugin and it worked fine.
Figure 12: Service state information of Arduino temperature sensor
As can be seen in figure 12 above Nagios successfully monitored the status of the
Arduino temperature sensor and the system was able to notify via email in case of the
readings exceeding the limit.
33
5
Results and Discussions
The three monitoring solutions were able to monitor the status of the network devices
and services. Additionally some of these solutions, particularly Zabbix and Cacti could
present the performance of network services such as bandwidth, disk space and
current load graphically, even though these services were not the main goals.
Implementation of the project went successfully and I was able to implement a system
which monitored the status of network devices and was able to report via e-mail when
any problems were encountered to network devices and services. Figure 13 shows a
problem and a recovery e-mail notification sent to network administrator about the
network service.
34
Figure 13: Screenshot of an e-mail notification
The network solution selected for implementation of this project was Nagios. Even
though it was the best option to fulfil our goals and objectives, I would recommend
installing Cacti along with Nagios for a better graphical presentation of critical network
services. The other problem encountered during the project was availability of an IPbased network device. We changed the device two times and it was time-consuming to
get a new device. Notification mechanism in this thesis was via e-mail. However I
would suggest including other mechanisms such as SMS. When there is no internet
connection, there is no e-mail notification sent to the network admin.
6
Conclusion
The main goal of the thesis is to implement a simple network monitoring system, which
would ensure the availability of network devices and services at the Metropolia network
laboratory, Leppavaara campus. The system needed to be simple, cost-effective and
compatible to be implemented for critical lab services and infrastructure components.
To implement a system which would meet the goals and objectives, three software
solutions, Nagios, Zabbix and Cacti were selected, implemented and evaluated with
respect to their advantages and disadvantages. The best option which could easily
meet the goals and objectives of the thesis was Nagios. Using the Nagios monitoring
tool I was able to implement a system with the following key features:
 Open source software
 Easily expandable to new targets
 Able to monitor the status network devices and services
 Able to send notification e-mails to network admins in case of a
fault happens on network devices and service
 Easy to upgrade and maintain
In general, implementing this kind of system helps to improve the quality of service by
verifying the health status of a network device and service at a certain interval. Another
benefit is that network admins can easily identify the source of network faults and take
action before it affects clients (end users).
35
References
1. Wojciech K. Learning Nagios 3.0. Birmingham, B27 6PA, UK: Packet Publishing
Ltd; 2008.
2. Wolfgang Barth. Nagios. San Francisco, CA 94107: No Starch Press, Inc.;
2010.
3. Esad S. and Ivan I. Network Monitoring and Management Recommendations
[online].
Serbia:
AMRES
led
working
group;
February
2011.URL:
http://services.geant.net/cbp/Knowledge_Base/Network_Monitoring/Documents/
gn3-na3-t4-abpd101.pdf. Accessed April 17 2015.
4.
Cisco. Network Management System: Best Practices White Paper [online]. San
Jose, CA: Cisco; June 2007.
URL:http://www.cisco.com/c/en/us/support/docs/availability/highavailability/15114-NMS-bestpractice.html. Accessed March 24 2015.
5. Dr. Foroughi. NETWORK MANAGEMENT [online]. University of Southern
Indiana: Indiana, USA; spring 2014.
URL: http://www.usi.edu/business/aforough/Chapter%2020.pdf. Accessed April
19 2015.
6. Aiko Pras. Network Management Architectures [online].Hengelo, The Nether
lands: University of Twente; 1995.
URL:http://www.hit.bme.hu/~jakab/edu/litr/TMN/Network_Management_Architec
tures_extr.pdf. Accessed February 2015.
36
7. Cisco. Network Management Basics [online]. SanJose, CA: Cisco Systems Inc.;
October2012.URL: http://docwiki.cisco.com/wiki/Network_Management_Basics.
Accessed April 15 2015.
8. Justin
Elingwood. An
Introduction
to SNMP [online].
ShareAlike
4.0
International: DigitalOceanInc. 2015.
URL:https://www.digitalocean.com/community/tutorials/an-introduction-to-snmpsimple-network-management-protocol. Accessed March 2015.
9. Rihards Olups. Abbix 1.8 Network Monitoring. Birmingham, B27 6PA, UK:
Packet Publishing Ltd.; 2010.
10. Zabbix SIA. Zabbix Documentation [online]. Birmingham: Share Alike 3.0;
2014.URL: https://www.zabbix.com/documentation/2.2/start. December 2014.
11. Ed Simmonds and Jason H. Evaluation of Nagios and Zabbix monitoring
[online]. Fermilab; February 2015.
URL:http://cd-docdb.fnal.gov/0032/003277/001/nagios_zabbix_evaluation.pdf.
March 2015.
12. Ian B., Tony R., Larry A. The Cacti Manual. The Cacti Group; 2012.
13. Dinangkur K. and S.M.Ibrahim. Cacti 0.8 Network Monitoring. Sydney: Packet
Publishing Ltd.; 2009.
14. Nagios Core. Nagios Core Documentation [online]. Nagios Enterprises, LLC;
2014.URL:http://nagios.sourceforge.net/docs/3_0/toc.html. Accessed November
2014.
15. Manoj Chauhan. Nagios Architecture [online]. DISQUS; January 2010.URL:
http://www.onaxer.com/2010/01/24/nagios-architecture/. Accessed March 2015.
16. Cisco. Network Management System [online]. Packet Storm; November
2002.URL:http://dl.packetstormsecurity.net/defcon10/MoreInfo/NetworkManage
mentSystem-BestPractices.pdf. Accessed April 2015.
37
17. Behrouz A. Data Communication and Networking. New York, America: McGrawHill Companies, Inc.; 2007.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement