Nagios Tutorial

Thu 02 October 2014
By alex

"Nagios is an open source computer system monitoring, network monitoring and infrastructure monitoring software application. Nagios offers monitoring and alerting services for servers, switches, applications, and services. It alerts the users when things go wrong and alerts them when the problem has been resolved." -- wikipedia

Nagios is relatively easy to configure, given a proper understanding of the configuration system. Nagios gets it's config by reading a series of .cfg files. Settings are not restricted to a specific config files. During runtime all of the files are combined and read together. This can lead to confusion if a stray definition is located in a file that an admin forgot about. This also means that you can have as many or as few config files as you want.

Nagios utilizes an inheritance system for assigning object's properties. The inheritance is comprised of a hierarchy of templates, hostgroups and servicegroups. The inheritance is similar to instances and classes in object oriented programming.

Host Creation
Template --- templates.cfg
       Host --- hostname.cfg

Service Assignment & Logical Grouping
Hostgroup --- hostgroup-name.cfg
Services
        Host --- hostgroup-name.cfg or hostname.cfg

Servicegroup --- servicegroup-name.cfg
Services                              
        Host --- hostgroup-name.cfg or hostname.cfg

The above diagram shows how hosts are created by referencing a host template in the templates.cfg file. The diagram also shows how hostgroups and servicegroups are assigned services and then hosts are assigned to the respective groups. Every host in a host/service group will inherit all of the services assigned to the group. This saves the administrator from creating the same service definition in multiple host configuration files. However, custom services should still be configured in the host file.

objects/servers/01-mail.cfg
define host{
        use                         linux-box
        host_name                   mail.example.com
        alias                       Email Server
        address                     xxx.xxx.xxx.xxx
        hostgroups                  mail.servers
        }

objects/templates.cfg
define host{
        name                        linux-box             
        use                         generic-host 
        check_period                24x7         
        check_interval              5            
        retry_interval              1                     
        max_check_attempts          2   
        check_command               check-host-alive 
        notification_period         24x7         
        notification_interval       20       
        notification_options        w,d,u,r       
        contact_groups              admins       
        register                    0     ;defines this as a template
        }

In the above example the use directive calls the linux-box template. Nagios' inheritance system now gives mail.example.com all of the properties defined in the linux-box template. You can see that the linux-box template uses the generic-host template to apply settings as well.

objects/servers/web-servers.cfg
define service{
        use                         generic-service
        hostgroup_name              web.servers
        service_description         HTTP
        check_command               check_http
        }  

objects/templates.cfg
define service{
        name                        generic-service         
        use                         local-service      
        max_check_attempts          2                     
        normal_check_interval       5                      
        retry_check_interval        1                     
        register                    0            
        }      

This example show a service definition from the hostgroup config, web-servers.cfg. This service uses the generic-service template which uses the local-service template which are both located in templates.cfg. Every member of the web-servers.cfg hostgroup will now have the HTTP service associated with it.

objects/servers/web-servers.cfg
define hostgroup{
        hostgroup_name              web.servers
        alias                       Web Servers            
        members                     web.example.com
        }

objects/servers/02-web.cfg
define host{
        use                         linux-box
        host_name                   web.example.com
        alias                       Webserver
        address                     xxx.xxx.xxx.xxx
        hostgroups                  web.servers
        }

Objects/servers/all.cfg
define hostgroup{
        hostgroup_name              all
        alias                       All
        members                     *
        }

Hostgroup membership is defined by the member directive in the hostgroup config. Hostgroup membership can also be configured by using the hostgroups directive in the host config file . The first method makes management much easier due to only having to edit a single file to change hostgroup membership. To create a group that contains every host, instead of entering the name of each hosts enter a * wildcard. Nagios interprets the wildcard to mean all hosts. This is best used to assign a base set of services to every host.

define service{
        use                         generic-service       
        hostgroup_name              all
        }

service_description                 SSH
        check_command               check_ssh!22
        }

define command{
        command_name                check_ssh
        command_line                $USER1$/check_ssh -p $ARG1$ $HOSTADDRESS$
        }

Services ultimately rely on external commands that Nagios executes. A service defines the command Nagios uses with the check_command directive. The value of check_command must match a command definition that has been defined in commands.cfg . Command definitions are used to create a name for Nagios to use for the command as well as a template for what options will be used with the command. Understanding how to use a command is best accomplished by executing the actual script or executable found in nagios/libexec. That will allow the admin to learn what switches and arguments will be best used for their specific requirement. Once a suitable usage of the command has been identified and a command definition has been created arguments for the command are configured in the service definition by preceding the value with an exclamation point.

Out of the box, Nagios is a powerful monitoring solution. However it's true power lies in being able to understand it and configure it to suite the needs of a specific environment.