Nagios monitoring on AWS Autoscaling group instances

Setting Nagios monitoring alerts on instances behind the AWS Elastic Load Balancer (ELB) is always tricky part. In this post, we will learn how we can monitor the instances which are created by autoscaling policy and running behind the ELB.

We expect our reader do have good knowledge on scripting, nagios and AWS comptuting. This help them to quickly understand the scenario and method we used here.

Challenges

1. In AWS autoscaling, new instances get created automatically as per defined conditions. So for System Admin has quite tedious job to add new instances in Nagios Server.

Ideas and Approaches to monitor AWS autoscaling group instances

Logic: Basically in AWS Autoscaling, servers are launched from common AMI. Hence, all of them have same configuration and monitoring requirement.

Ideas and Possible Methods

Idea 1: Running the script from Nagios Server which keep checking if any new server is launched in particular AWS autoscaling group. This script will create required nagios configuration file in Nagios Server.

Idea 2: Running script when autoscaling server launches or boot. In this idea, script at the time of server booting time, create nagios host config file and sftp to Nagios Server at desired location. I practically applied this also and worked too. Later found the issue that sometime sftp not happened properly due to time taken in booting,connecting to server failed etc.

Idea 3: Automation tools like puppet, Chef can also be used here. You only have to figure out how you will get information of new instance launched in Autoscaling group .

Our Approach to setup Nagios monitoring on AWS Autoscaling

We will use our Idea 1 as described in above section. The given below are important points on which it is based on.

(1) Services File: Creating Autoscaling common nagios services file called ASG_services.cfg.
(2) Host Group File: Pre-defining hostgroup name with simple configuration. We are not adding members directly in this hostgroup file. Our hostgroup file for autoscaling name is given as ASG_groups.cfg .
(3) Host File For Each Instance : The script called nagios_asg.sh will fetch the information how many instances available behind ELB and its instance id. This script will also create each instance id nagios host file in nagios server.

Lets start setting the nagios monitroing for AWS autoscaling group.

We are doing all the steps in Nagios Server.

(A) Create new directory dedicated for autoscaling inside nagios setup directory

mkdir -p /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A

Edit /usr/local/nagios/etc/nagios.cfg

vi /usr/local/nagios/etc/nagios.cfg

Add new line as given below .

cfg_dir=/usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A

(B)Create Nagios group file for AWS autoscaling :

Create new nagios host group file called ASG_groups.cfg inside newly created directory /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A .

vi /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A/ASG_groups.cfg

Now add the below given content in ASG_groups.cfg file. You can replace the defined hostgroup_name as per your convenience.

define hostgroup {
        hostgroup_name  PROJECT_EXAMPLE_A_autoscale
        alias           PROJECT_EXAMPLE_A_autoscale
}

(C) Create Autoscaling common nagios services file

Now create Autoscaling common nagios services file called ASG_services.cfg inside directory /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A/ .

vi /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A/ASG_services.cfg

The given below is sample alerts which we have placed. You can add your own alerts. Main role is played by hostgroup_name hence do not skip this.


define service {
        hostgroup_name PROJECT_EXAMPLE_A_autoscale
        service_description     Root Partition Usage
        check_command           check_nrpe!-H $HOSTADDRESS$ -c check_sda
        use     generic-service
}

define service {
        hostgroup_name PROJECT_EXAMPLE_A_autoscale
        service_description     Memory Usage
        check_command           check_nrpe!-H $HOSTADDRESS$  -c check_mem -t 60
        use     generic-service
}

define service {
        hostgroup_name PROJECT_EXAMPLE_A_autoscale
        service_description     CPU-Avg
        check_command           check_nrpe!-H $HOSTADDRESS$  -c check_cpu -t 60
        use     generic-service
}

define service {
        hostgroup_name PROJECT_EXAMPLE_A_autoscale
        service_description     Current Users
        check_command           check_nrpe!-H $HOSTADDRESS$  -c check_users -t 60
        use     generic-service
}

(D) Create nagios_asg.sh script

Before proceeding , accomplish the following prerequisites.

Prerequisites for nagios_asg.sh script

1. Install AWS-CLI tool in system:
You must have pip installed already in system (Follow this guide to install pip ).

pip install awscli

2. Secret Keys/Access Keys Of IAM user :

Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.

(a)In the navigation pane, choose Users.
(b)Choose the name of the desired user which can read the ELB and instances behind the ELB, and then choose the Security Credentials tab. The user’s access keys and the status of each key is displayed.

Note: Only the user’s access key ID is visible. The secret access key can only be retrieved when creating the key.

Create Nagios AWS autoscaling group script called nagios_asg.sh . This script play important role in whole setup. It will create new nagios host file for each instances which are running behind the ELB .

IMPORTANT Note: Define your own variable values as per your setup in script. Define IAM user access key and secret key in file. In a short, please read the script carefully and do the changes as per your requirement.

vi /opt/nagios_asg.sh

Copy and paste the content in script. And do the changes in script.

#!/bin/bash
#
# Author : Sharad Kumar Chhetri
# Version : 1.0
# Date : 4-Sept-2015
# Description : Check Autoscaling of the Project and create non available nagios host
# Blog : https://sharadchhetri.com

### Set AWS KEY of IAM user which can fetch the ELB information.
export AWS_ACCESS_KEY_ID=sdfhwqkjeluw/er29kasd
export AWS_SECRET_ACCESS_KEY=fdzfzdjfljzwerpiofn934kfe/3mr9nkds

## File Name for storing output values
_PROJECT-ELB_INSTANCES=elb_instance
_PROJECT-NAGIOS_EXISTING_FILES=nagios_existing_files

### Check ELB Name from AWS console and set the value in variable _PROJECT-ELB_NAME
_PROJECT-ELB_NAME=your-project-elb-name

### Give the AWS region name where the ELB exist
_AWS_REGION_NAME=ap-southeast-1

### Give nagios host group name
_GROUP_NAME=PROJECT_EXAMPLE_A_autoscale

### Absolute path of nagios config directory set for Autoscaling Group , we have addressed this in Step A in this blog post

_CURRENT_DIR=/usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A


### Give enviornment name eg, production/staging/Dev/Test 
_ENVIRONMENT_NAME_=Production

if [ ! -d "$_CURRENT_DIR/$_ENVIRONMENT_NAME_" ]; then
  mkdir -p "$_CURRENT_DIR/$_ENVIRONMENT_NAME_"
fi

### Function Assigned
nagiosconfigfile () {

        echo "define host{" >>  "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
        echo "        use                     linux-server" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
        echo "        host_name                $_ENVIRONMENT_NAME_$MY_PROJECT" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
        echo "        alias                   $_ENVIRONMENT_NAME_$MY_PROJECT" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
        echo "        hostgroups                   $_GROUP_NAME" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
        echo "        address                 $_PROJECT_PUBLICIP" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
        echo '}' >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg
        echo "" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg

}

### information of instances behind the ELB saved in file (Variable Name = _PROJECT-ELB_INSTANCES)
aws --region $_AWS_REGION_NAME elb describe-load-balancers --load-balancer-names $_PROJECT-ELB_NAME --output text|grep INSTANCES|awk '{print $2}' > $_PROJECT-ELB_INSTANCES

### List the instance name which config files are already exist in nagios dir.
ls -1 $_CURRENT_DIR|sed "s/$_ENVIRONMENT_NAME_//g;s/.cfg//g" > $_PROJECT-NAGIOS_EXISTING_FILES

## Compare two files (variable of _PROJECT-ELB_INSTANCES and _PROJECT-NAGIOS_EXISTING_FILES)
### Then compared ouput value store in variable called MY_PROJECT

grep -v -f $_PROJECT-NAGIOS_EXISTING_FILES $_PROJECT-ELB_INSTANCES |while read MY_PROJECT
do
_PROJECT_PUBLICIP=$(aws --region $_AWS_REGION_NAME ec2 describe-instances --instance-ids $MY_PROJECT --query  'Reservations[].Instances[].[PublicIpAddress]' --output text)
_PUBLIC_IP_WC=`echo $_PROJECT_PUBLICIP|wc -l`

if [ "$_PUBLIC_IP_WC" -gt 0 ]
then
cat /dev/null > "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg;
nagiosconfigfile;

##### If Nagios server is not CentOS 7 / RHEL 7 then use command 'service nagios restart' instead of systemctl.

systemctl reload nagios
fi

done

### Unset the variable value AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY

## End Of Line ##

Give executable permission to script.

chmod +x /opt/nagios_asg.sh

Explore the directory /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A , here inside you will find some files and environment directory. Change to environment name directory which you defined in script and will see multiple nagios host files respective to instances.

Check the script by running manually before adding in crontab . In crontab, we are running this nagios_asg.sh script after every 4 minutes.

crontab -e -u root
*/4 * * * * /opt/nagios_asg.sh

We are running successfully this method and no issue still so far. We tried to explain our logic and method as much as possible. Hope it will benefit you as well.

24 thoughts on “Nagios monitoring on AWS Autoscaling group instances”

  1. Hi Sharad,

    Good day!

    I’m also encountering this issue.

    nagios_asg.sh: 14: nagios_asg.sh: _PROJECT-ELB_INSTANCES=/usr/local/nagios/etc/objects/Dev04/elb_instance: not found
    nagios_asg.sh: 15: nagios_asg.sh: _PROJECT-NAGIOS_EXISTING_FILES=/usr/local/nagios/etc/objects/Dev04/nagios_existing_files: not found
    nagios_asg.sh: 18: nagios_asg.sh: _PROJECT-ELB_NAME=MyApp-ALB: not found

    Unknown options: -ELB_NAME
    grep: _INSTANCES: invalid context length argument

    could you enlighten us on how we should set the configuration for this?

    Thank you

  2. Hello,

    nice post!

    I’m trying to setup, but I’m getting a few errors:

    # sh nagios_asg.sh
    nagios_asg.sh: 14: nagios_asg.sh: _PROJECT-ELB_INSTANCES=/usr/local/nagios/etc/objects/Dev04/elb_instance: not found
    nagios_asg.sh: 15: nagios_asg.sh: _PROJECT-NAGIOS_EXISTING_FILES=/usr/local/nagios/etc/objects/Dev04/nagios_existing_files: not found
    nagios_asg.sh: 18: nagios_asg.sh: _PROJECT-ELB_NAME=MyApp-ALB: not found

    Unknown options: -ELB_NAME
    grep: _INSTANCES: invalid context length argument

    Could you please help me what I’m doing wrong?

    Thanks!

  3. Hi Sharad i need some help
    I need to setup nagios monitoring for the autoscaling group , there are multiple problems i am having are.
    1. There is more than 1 autoscaling group for which i have to setup automated monitoring.
    2. The instances which are added while autoscaling by script should also get removed at downscale.
    3. The loadbalancer we are using is Application loadbalancer for which this command from your script does not work
    aws –region $_AWS_REGION_NAME elb describe-load-balancers –load-balancer-names $_PROJECT-ELB_NAME –output text|grep INSTANCES|awk ‘{print $2}’ > $_PROJECT-ELB_INSTANCES

    i tried changing elb keyword to elbv2 in the command but didn’t worked.

    And i am unable to under stand which files you are referring to in this lines
    _PROJECT-ELB_INSTANCES=elb_instance
    _PROJECT-NAGIOS_EXISTING_FILES=nagios_existing_files
    Please explain if possible
    Thank you for your help in advance waiting for your reply.

    • H Rajesh,

      I hope you understand the logic used here. Rest is the way to do by using command line.
      I do not have this setup right now but if time allows will check. Expect the delay.

      Regards
      Sharad

  4. Hi Sharad,

    can I use this script in the local environment instead of AWS autoscaling?
    What ‘ll be necessary changes required if I have to use this script on the local Nagios server? I want to connect Linux client to Nagios server without doing any changes to the server.

    • Hi Anish,

      Yes you can use this script but you have to modify the lines wherever AWS command require.
      If you could elaborate your scenario then I can suggest you in better way.

      Regards
      Sharad

  5. This is great! Any ideas on how to restart the nagios service ONLY if a new machine spins up? Say I currently have machines 1 and 2 up, then machine 3 spins up. So I want it to leave the configs for 1 and 2 alone so that uptime is not affected, but add machine 3.

      • Apologies I don’t think I was very clear. So if I have two machines, and one dies, auto scaling will spin up another machine. So now I want Nagios to keep machine one, and machine three, and remove machine two. It looks like your script handles all but the removal? How would you add that?

        • Hi Jake,

          I think I understand your query now.Removal function is intentionally not added in this script.
          It has strong reason – If anybody stop or remove the machine(EC2) by mistake/accidentally then nagios should alert. It is a part of security and auditing.

          If still you want to opt for removal function then you have to modify this script.
          You must have two inventory files to compare.
          File A: It should have autoscaling inventory which is currently in use by Nagios
          File B: Second file should have always updated autoscaling inventory.

          Logic: When any changes found in File B then Update File A and this File A update Nagios Hosts for autoscaling.
          If no changes found after comparing these 2 inventory files then no change in Nagios hosts for autoscaling.

          I tried here with simple logic and using old school technique. In advanced tech then you can use some nosql Database which keep updated inventory…something like that.

          I hope you understand the logic and you will take it forward by your own working style.

          Regards
          Sharad

  6. Hi,
    I think you should have declared “$_PROJECT-ELB_INSTANCES” variable instead of $MY_PROJECT in this awscli command below;

    _PROJECT_PUBLICIP=$(aws –region $_AWS_REGION_NAME ec2 describe-instances –instance-ids $MY_PROJECT –query ‘Reservations[].Instances[].[PublicIpAddress]’ –output text)

    Also i couldn’t see “$MY_PROJECT” variable where is declared ?

      • Thank you very much Sharad,
        You helped me prety much.

        Here is my own script;

        #!/bin/bash

        CURRENT_DIR=/usr/local/nagios/etc/objects

        ASG_NAMEs=`aws autoscaling describe-auto-scaling-groups –region eu-central-1 –query AutoScalingGroups[*].[AutoScalingGroupName] –output text`

        cat /dev/null > $CURRENT_DIR/hostgroups/ASG-hostgroups.cfg

        for ASG_NAME in $ASG_NAMEs
        do
        ASG_ELB=`aws autoscaling describe-auto-scaling-groups –auto-scaling-group-name $ASG_NAME –region eu-central-1 –query AutoScalingGroups[*].[LoadBalancerNames] –output text`

        declare -a ELB_INSTANCE_IDS
        ELB_INSTANCE_IDS=(`aws elb describe-instance-health –load-balancer-name $ASG_ELB –region eu-central-1 –query
        ‘InstanceStates[?State==\`InService\`].[InstanceId]’ –output text`)

        echo ”
        define hostgroup {
        alias ASG-$ASG_NAME Hosts
        hostgroup_name ASG-$ASG_NAME
        }” >> $CURRENT_DIR/hostgroups/ASG-hostgroups.cfg

        if [ ${#ELB_INSTANCE_IDS[@]} -gt 0 ]; then
        PRIVATE_IPs=`aws ec2 describe-instances –instance-ids ${ELB_INSTANCE_IDS[@]} –query ‘Reservations[*].Instances[*].[PrivateIpAddress]’ –region eu-central-1 –output text`

        if [ -f $CURRENT_DIR/hosts/ASG-$ASG_NAME.cfg ]; then
        cat /dev/null > $CURRENT_DIR/hosts/ASG-$ASG_NAME.cfg
        fi

        for IP in $PRIVATE_IPs
        do
        echo ”
        define host {
        alias $ASG_NAME Host
        use linux-server
        host_name $ASG_NAME-$IP
        address $IP
        hostgroups ASG-$ASG_NAME
        }” >> $CURRENT_DIR/hosts/ASG-$ASG_NAME.cfg
        done

        HOST_GROUP_NAMES+=”ASG-$ASG_NAME,”
        fi
        done

        echo ”
        define service {
        hostgroup_name $HOST_GROUP_NAMES
        service_description Root Partition Usage
        check_command check_nrpe!-H \$HOSTADDRESS$ -c check_sda
        use generic-service
        }

        define service {
        hostgroup_name $HOST_GROUP_NAMES
        service_description Memory Usage
        check_command check_nrpe!-H \$HOSTADDRESS$ -c check_mem -t 60
        use generic-service
        }

        define service {
        hostgroup_name $HOST_GROUP_NAMES
        service_description CPU-Avg
        check_command check_nrpe!-H \$HOSTADDRESS$ -c check_cpu -t 60
        use generic-service
        }
        define service {
        hostgroup_name $HOST_GROUP_NAMES
        service_description Uptime
        check_command check_nrpe!-H \$HOSTADDRESS$ -p 5666 -t 60 -c check_uptime -a 0 0
        use generic-service
        }” > $CURRENT_DIR/services/ASG-services.cfg

        service nagios restart

  7. Hi Sharad!

    How are you?

    First of all, great work on this script!

    I’m trying to set it up, but I’m getting a few errors:

    [root@nagios objects]# ./nagios_asg.sh
    ./nagios_asg.sh: line 14: _PROJECT-ELB_INSTANCES=elb_instance: command not found
    ./nagios_asg.sh: line 15: _PROJECT-NAGIOS_EXISTING_FILES=nagios_existing_files: command not found
    ./nagios_asg.sh: line 18: _PROJECT-ELB_NAME=testelb-2011504061.us-east-1.elb.amazonaws.com: command not found

    Unknown options: -ELB_NAME
    grep: _INSTANCES: invalid context length argument

    Could you please tell me what I’m doing wrong?

    Thanks a lot!

    • Hi Dioga,

      Probably it is your system shell which is taking the variable as command.
      elb_instace and nagios_existing_files are referred here as files.

      Instead you can declare variable like this. Give absolute path where these file should be created.


      _PROJECT-ELB_INSTANCES=/Give/Absolute/PATH/elb_instance
      _PROJECT-NAGIOS_EXISTING_FILES=/Give/Absolute/PATH/nagios_existing_files

      Regards
      Sharad

  8. Hi ,

    Great work . But my auto scaling group policies include increasing and decreasing number of instances based of CPU load . So in case of decreasing the number of instances how will the nagios server detect it whether the instance is down or instance is terminated by autoscaling group

  9. Very well explained.

    How does this work for multiple AWS accounts?
    Do we use multiple copies of the same script?
    Or can we store the keys in a separate config file and call them separately for each individual AWS account?

    • Hello Sunil,

      Thank you for feedback. I hope you understand the logic how this is achieved. Your question is very good and I have worked in such requirement like multiple AWS account API call.
      You can make it either single or multiple script, it is up to your choice and managing script work. If you can make advanced bash script, given below is the logic.

      If you are looking for Single script, here is the basic roadmap. Idea is just call the FUNCTION.

      You can use, for…do loop or anything which is applicable.
      1. Create one file and keep all AWS account ACCESS and SECRET key in serial order eg.

      ACCOUNT1:AWS_KEY:SECRET_KEY
      ACCOUNT2:AWS_KEY:SECRET_KEY

      2. Create Function for this task

      3. By using EXPORT (setting environment), you can set AWS Access key and Secret Key . also note at end never forget to unset the environment.

      export AWS_ACCESS_KEY=your-aws-access-key-id
      export AWS_SECRET_KEY=your-aws-secret-key
      

      3. And in end , call the FUNCTION within loop.

      ## Declare the variable for your script.
      
      ## Declare the FUNCTION . Means this function should able to complete the task. You have to work on it as per your cases.
      
      ## open loop with 'for'  
      ## Get particular account AWS Access and Secret key value from other file , you can use grep and awk
      ## Set the AWS Access and Secret key with EXPORT , here is the sample
      
      ACCESS=$(grep ACCOUNT1 /opt/aws_accounts_file|awk -F: '{print $2}')  ### See file format which I suggested in serial no. 1
      SECRET=$(grep ACCOUNT1 /opt/aws_accounts_file|awk -F: '{print $3}')
      
      export AWS_ACCESS_KEY=$ACCESS
      export AWS_SECRET_KEY=$SECRET
      
      ## Call the function 
      ## Close the loop
      
      
  10. Great thanks!
    But inside autoscaling istances, what i need install to make a connection with Nagios server?

Comments are closed.