Nagios AWS

Setting Nagios monitoring alerts on instances behind the AWS Elastic Load Balancer (ELB) is always tricky part. In this post, we will learn how we can monitor the instances which are created by autoscaling policy and running behind the ELB.

We expect our reader do have good knowledge on scripting, nagios and AWS comptuting. This help them to quickly understand the scenario and method we used here.

Challenges

1. In AWS autoscaling, new instances get created automatically as per defined conditions. So for System Admin has quite tedious job to add new instances in Nagios Server.

Ideas and Approaches to monitor AWS autoscaling group instances

Logic: Basically in AWS Autoscaling, servers are launched from common AMI. Hence, all of them have same configuration and monitoring requirement.

Ideas and Possible Methods

Idea 1: Running the script from Nagios Server which keep checking if any new server is launched in particular AWS autoscaling group. This script will create required nagios configuration file in Nagios Server.

Idea 2: Running script when autoscaling server launches or boot. In this idea, script at the time of server booting time, create nagios host config file and sftp to Nagios Server at desired location. I practically applied this also and worked too. Later found the issue that sometime sftp not happened properly due to time taken in booting,connecting to server failed etc.

Idea 3: Automation tools like puppet, Chef can also be used here. You only have to figure out how you will get information of new instance launched in Autoscaling group .

Approach to setup Nagios monitoring on AWS Autoscaling

We will use our Idea 1 as described in above section. The given below are important points on which it is based on.

(1) Services File: Creating Autoscaling common nagios services file called ASG_services.cfg.
(2) Host Group File: Pre-defining hostgroup name with simple configuration. We are not adding members directly in this hostgroup file. Our hostgroup file for autoscaling name is given as ASG_groups.cfg .
(3) Host File For Each Instance : The script called nagios_asg.sh will fetch the information how many instances available behind ELB and its instance id. This script will also create each instance id nagios host file in nagios server.

Lets start setting the nagios monitroing for AWS autoscaling group.

We are doing all the steps in Nagios Server.

(A) Create new directory dedicated for autoscaling inside nagios setup directory

mkdir -p /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A

Edit /usr/local/nagios/etc/nagios.cfg

vi /usr/local/nagios/etc/nagios.cfg

Add new line as given below .

cfg_dir=/usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A

(B)Create Nagios group file for AWS autoscaling :

Create new nagios host group file called ASG_groups.cfg inside newly created directory /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A .

vi /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A/ASG_groups.cfg

Now add the below given content in ASG_groups.cfg file. You can replace the defined hostgroup_name as per your convenience.

define hostgroup {
hostgroup_name PROJECT_EXAMPLE_A_autoscale
alias PROJECT_EXAMPLE_A_autoscale
}

(C) Create Autoscaling common nagios services file

Now create Autoscaling common nagios services file called ASG_services.cfg inside directory /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A/ .

vi /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A/ASG_services.cfg

The given below is sample alerts which we have placed. You can add your own alerts. Main role is played by hostgroup_name hence do not skip this.

define service {
hostgroup_name PROJECT_EXAMPLE_A_autoscale
service_description Root Partition Usage
check_command check_nrpe!-H $HOSTADDRESS$ -c check_sda
use generic-service
}

define service {
hostgroup_name PROJECT_EXAMPLE_A_autoscale
service_description Memory Usage
check_command check_nrpe!-H $HOSTADDRESS$ -c check_mem -t 60
use generic-service
}

define service {
hostgroup_name PROJECT_EXAMPLE_A_autoscale
service_description CPU-Avg
check_command check_nrpe!-H $HOSTADDRESS$ -c check_cpu -t 60
use generic-service
}

define service {
hostgroup_name PROJECT_EXAMPLE_A_autoscale
service_description Current Users
check_command check_nrpe!-H $HOSTADDRESS$ -c check_users -t 60
use generic-service
}

(D) Create nagios_asg.sh script

Before proceeding , accomplish the following prerequisites.

Prerequisites for nagios_asg.sh script

1. Install AWS-CLI tool in system:
You must have pip installed already in system (Follow this guide to install pip ).

pip install awscli

2. Secret Keys/Access Keys Of IAM user :

Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.

(a)In the navigation pane, choose Users.
(b)Choose the name of the desired user which can read the ELB and instances behind the ELB, and then choose the Security Credentials tab. The user’s access keys and the status of each key is displayed.

Note: Only the user’s access key ID is visible. The secret access key can only be retrieved when creating the key.

Create Nagios AWS autoscaling group script called nagios_asg.sh . This script play important role in whole setup. It will create new nagios host file for each instances which are running behind the ELB .

IMPORTANT Note: Define your own variable values as per your setup in script. Define IAM user access key and secret key in file. In a short, please read the script carefully and do the changes as per your requirement.

vi /opt/nagios_asg.sh

Copy and paste the content in script. And do the changes in script.

#!/bin/bash
#
# Author : Sharad Kumar Chhetri
# Version : 1.0
# Date : 4-Sept-2015
# Description : Check Autoscaling of the Project and create non available nagios host
# Blog : https://sharadchhetri.com

### Set AWS KEY of IAM user which can fetch the ELB information.
export AWS_ACCESS_KEY_ID=sdfhwqkjeluw/er29kasd
export AWS_SECRET_ACCESS_KEY=fdzfzdjfljzwerpiofn934kfe/3mr9nkds

## File Name for storing output values
_PROJECT-ELB_INSTANCES=elb_instance
_PROJECT-NAGIOS_EXISTING_FILES=nagios_existing_files

### Check ELB Name from AWS console and set the value in variable _PROJECT-ELB_NAME
_PROJECT-ELB_NAME=your-project-elb-name

### Give the AWS region name where the ELB exist
_AWS_REGION_NAME=ap-southeast-1

### Give nagios host group name
_GROUP_NAME=PROJECT_EXAMPLE_A_autoscale

### Absolute path of nagios config directory set for Autoscaling Group , we have addressed this in Step A in this blog post

_CURRENT_DIR=/usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A


### Give enviornment name eg, production/staging/Dev/Test
_ENVIRONMENT_NAME_=Production

if [ ! -d "$_CURRENT_DIR/$_ENVIRONMENT_NAME_" ]; then
mkdir -p "$_CURRENT_DIR/$_ENVIRONMENT_NAME_"
fi

### Function Assigned
nagiosconfigfile () {

echo "define host{" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
echo " use linux-server" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
echo " host_name $_ENVIRONMENT_NAME_$MY_PROJECT" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
echo " alias $_ENVIRONMENT_NAME_$MY_PROJECT" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
echo " hostgroups $_GROUP_NAME" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
echo " address $_PROJECT_PUBLICIP" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
echo '}' >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
echo "" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg

}

### information of instances behind the ELB saved in file (Variable Name = _PROJECT-ELB_INSTANCES)
aws --region $_AWS_REGION_NAME elb describe-load-balancers --load-balancer-names $_PROJECT-ELB_NAME --output text|grep INSTANCES|awk '{print $2}' > $_PROJECT-ELB_INSTANCES

### List the instance name which config files are already exist in nagios dir.
ls -1 $_CURRENT_DIR|sed "s/$_ENVIRONMENT_NAME_//g;s/.cfg//g" > $_PROJECT-NAGIOS_EXISTING_FILES

## Compare two files (variable of _PROJECT-ELB_INSTANCES and _PROJECT-NAGIOS_EXISTING_FILES)
### Then compared ouput value store in variable called MY_PROJECT

grep -v -f $_PROJECT-NAGIOS_EXISTING_FILES $_PROJECT-ELB_INSTANCES |while read MY_PROJECT
do
_PROJECT_PUBLICIP=$(aws --region $_AWS_REGION_NAME ec2 describe-instances --instance-ids $MY_PROJECT --query 'Reservations[].Instances[].[PublicIpAddress]' --output text)
_PUBLIC_IP_WC=`echo $_PROJECT_PUBLICIP|wc -l`

if [ "$_PUBLIC_IP_WC" -gt 0 ]
then
cat /dev/null > "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg;
nagiosconfigfile;

##### If Nagios server is not CentOS 7 / RHEL 7 then use command 'service nagios restart' instead of systemctl.

systemctl reload nagios
fi

done

### Unset the variable value AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY

## End Of Line ##

Give executable permission to script.

chmod +x /opt/nagios_asg.sh

Explore the directory /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A , here inside you will find some files and environment directory. Change to environment name directory which you defined in script and will see multiple nagios host files respective to instances.

Check the script by running manually before adding in crontab . In crontab, we are running this nagios_asg.sh script after every 4 minutes.

crontab -e -u root
*/4 * * * * /opt/nagios_asg.sh

We are running successfully this method and no issue still so far. We tried to explain our logic and method as much as possible. Hope it will benefit you as well.

Read Some More Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.