Setting Nagios monitoring alerts on instances behind the AWS Elastic Load Balancer (ELB) is always tricky part. In this post, we will learn how we can monitor the instances which are created by autoscaling policy and running behind the ELB.
We expect our reader do have good knowledge on scripting, nagios and AWS comptuting. This help them to quickly understand the scenario and method we used here.
Challenges
1. In AWS autoscaling, new instances get created automatically as per defined conditions. So for System Admin has quite tedious job to add new instances in Nagios Server.
Ideas and Approaches to monitor AWS autoscaling group instances
Logic: Basically in AWS Autoscaling, servers are launched from common AMI. Hence, all of them have same configuration and monitoring requirement.
Ideas and Possible Methods
Idea 1: Running the script from Nagios Server which keep checking if any new server is launched in particular AWS autoscaling group. This script will create required nagios configuration file in Nagios Server.
Idea 2: Running script when autoscaling server launches or boot. In this idea, script at the time of server booting time, create nagios host config file and sftp to Nagios Server at desired location. I practically applied this also and worked too. Later found the issue that sometime sftp not happened properly due to time taken in booting,connecting to server failed etc.
Idea 3: Automation tools like puppet, Chef can also be used here. You only have to figure out how you will get information of new instance launched in Autoscaling group .
Our Approach to setup Nagios monitoring on AWS Autoscaling
We will use our Idea 1 as described in above section. The given below are important points on which it is based on.
(1) Services File: Creating Autoscaling common nagios services file called ASG_services.cfg.
(2) Host Group File: Pre-defining hostgroup name with simple configuration. We are not adding members directly in this hostgroup file. Our hostgroup file for autoscaling name is given as ASG_groups.cfg .
(3) Host File For Each Instance : The script called nagios_asg.sh will fetch the information how many instances available behind ELB and its instance id. This script will also create each instance id nagios host file in nagios server.
Lets start setting the nagios monitroing for AWS autoscaling group.
We are doing all the steps in Nagios Server.
(A) Create new directory dedicated for autoscaling inside nagios setup directory
mkdir -p /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A
Edit /usr/local/nagios/etc/nagios.cfg
vi /usr/local/nagios/etc/nagios.cfg
Add new line as given below .
cfg_dir=/usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A
(B)Create Nagios group file for AWS autoscaling :
Create new nagios host group file called ASG_groups.cfg inside newly created directory /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A .
vi /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A/ASG_groups.cfg
Now add the below given content in ASG_groups.cfg file. You can replace the defined hostgroup_name as per your convenience.
define hostgroup { hostgroup_name PROJECT_EXAMPLE_A_autoscale alias PROJECT_EXAMPLE_A_autoscale }
(C) Create Autoscaling common nagios services file
Now create Autoscaling common nagios services file called ASG_services.cfg inside directory /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A/ .
vi /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A/ASG_services.cfg
The given below is sample alerts which we have placed. You can add your own alerts. Main role is played by hostgroup_name hence do not skip this.
define service { hostgroup_name PROJECT_EXAMPLE_A_autoscale service_description Root Partition Usage check_command check_nrpe!-H $HOSTADDRESS$ -c check_sda use generic-service } define service { hostgroup_name PROJECT_EXAMPLE_A_autoscale service_description Memory Usage check_command check_nrpe!-H $HOSTADDRESS$ -c check_mem -t 60 use generic-service } define service { hostgroup_name PROJECT_EXAMPLE_A_autoscale service_description CPU-Avg check_command check_nrpe!-H $HOSTADDRESS$ -c check_cpu -t 60 use generic-service } define service { hostgroup_name PROJECT_EXAMPLE_A_autoscale service_description Current Users check_command check_nrpe!-H $HOSTADDRESS$ -c check_users -t 60 use generic-service }
(D) Create nagios_asg.sh script
Before proceeding , accomplish the following prerequisites.
Prerequisites for nagios_asg.sh script
1. Install AWS-CLI tool in system:
You must have pip installed already in system (Follow this guide to install pip ).
pip install awscli
2. Secret Keys/Access Keys Of IAM user :
Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
(a)In the navigation pane, choose Users.
(b)Choose the name of the desired user which can read the ELB and instances behind the ELB, and then choose the Security Credentials tab. The user’s access keys and the status of each key is displayed.
Note: Only the user’s access key ID is visible. The secret access key can only be retrieved when creating the key.
Create Nagios AWS autoscaling group script called nagios_asg.sh . This script play important role in whole setup. It will create new nagios host file for each instances which are running behind the ELB .
IMPORTANT Note: Define your own variable values as per your setup in script. Define IAM user access key and secret key in file. In a short, please read the script carefully and do the changes as per your requirement.
vi /opt/nagios_asg.sh
Copy and paste the content in script. And do the changes in script.
#!/bin/bash # # Author : Sharad Kumar Chhetri # Version : 1.0 # Date : 4-Sept-2015 # Description : Check Autoscaling of the Project and create non available nagios host # Blog : https://sharadchhetri.com ### Set AWS KEY of IAM user which can fetch the ELB information. export AWS_ACCESS_KEY_ID=sdfhwqkjeluw/er29kasd export AWS_SECRET_ACCESS_KEY=fdzfzdjfljzwerpiofn934kfe/3mr9nkds ## File Name for storing output values _PROJECT-ELB_INSTANCES=elb_instance _PROJECT-NAGIOS_EXISTING_FILES=nagios_existing_files ### Check ELB Name from AWS console and set the value in variable _PROJECT-ELB_NAME _PROJECT-ELB_NAME=your-project-elb-name ### Give the AWS region name where the ELB exist _AWS_REGION_NAME=ap-southeast-1 ### Give nagios host group name _GROUP_NAME=PROJECT_EXAMPLE_A_autoscale ### Absolute path of nagios config directory set for Autoscaling Group , we have addressed this in Step A in this blog post _CURRENT_DIR=/usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A ### Give enviornment name eg, production/staging/Dev/Test _ENVIRONMENT_NAME_=Production if [ ! -d "$_CURRENT_DIR/$_ENVIRONMENT_NAME_" ]; then mkdir -p "$_CURRENT_DIR/$_ENVIRONMENT_NAME_" fi ### Function Assigned nagiosconfigfile () { echo "define host{" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg echo " use linux-server" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg echo " host_name $_ENVIRONMENT_NAME_$MY_PROJECT" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg echo " alias $_ENVIRONMENT_NAME_$MY_PROJECT" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg echo " hostgroups $_GROUP_NAME" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg echo " address $_PROJECT_PUBLICIP" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg echo '}' >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg echo "" >> "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg } ### information of instances behind the ELB saved in file (Variable Name = _PROJECT-ELB_INSTANCES) aws --region $_AWS_REGION_NAME elb describe-load-balancers --load-balancer-names $_PROJECT-ELB_NAME --output text|grep INSTANCES|awk '{print $2}' > $_PROJECT-ELB_INSTANCES ### List the instance name which config files are already exist in nagios dir. ls -1 $_CURRENT_DIR|sed "s/$_ENVIRONMENT_NAME_//g;s/.cfg//g" > $_PROJECT-NAGIOS_EXISTING_FILES ## Compare two files (variable of _PROJECT-ELB_INSTANCES and _PROJECT-NAGIOS_EXISTING_FILES) ### Then compared ouput value store in variable called MY_PROJECT grep -v -f $_PROJECT-NAGIOS_EXISTING_FILES $_PROJECT-ELB_INSTANCES |while read MY_PROJECT do _PROJECT_PUBLICIP=$(aws --region $_AWS_REGION_NAME ec2 describe-instances --instance-ids $MY_PROJECT --query 'Reservations[].Instances[].[PublicIpAddress]' --output text) _PUBLIC_IP_WC=`echo $_PROJECT_PUBLICIP|wc -l` if [ "$_PUBLIC_IP_WC" -gt 0 ] then cat /dev/null > "$_CURRENT_DIR/$_ENVIRONMENT_NAME_$MY_PROJECT".cfg; nagiosconfigfile; ##### If Nagios server is not CentOS 7 / RHEL 7 then use command 'service nagios restart' instead of systemctl. systemctl reload nagios fi done ### Unset the variable value AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY unset AWS_ACCESS_KEY_ID unset AWS_SECRET_ACCESS_KEY ## End Of Line ##
Give executable permission to script.
chmod +x /opt/nagios_asg.sh
Explore the directory /usr/local/nagios/etc/objects/PROJECT_EXAMPLE_A , here inside you will find some files and environment directory. Change to environment name directory which you defined in script and will see multiple nagios host files respective to instances.
Check the script by running manually before adding in crontab . In crontab, we are running this nagios_asg.sh script after every 4 minutes.
crontab -e -u root
*/4 * * * * /opt/nagios_asg.sh
We are running successfully this method and no issue still so far. We tried to explain our logic and method as much as possible. Hope it will benefit you as well.
Hi Sharad,
Good day!
I’m also encountering this issue.
nagios_asg.sh: 14: nagios_asg.sh: _PROJECT-ELB_INSTANCES=/usr/local/nagios/etc/objects/Dev04/elb_instance: not found
nagios_asg.sh: 15: nagios_asg.sh: _PROJECT-NAGIOS_EXISTING_FILES=/usr/local/nagios/etc/objects/Dev04/nagios_existing_files: not found
nagios_asg.sh: 18: nagios_asg.sh: _PROJECT-ELB_NAME=MyApp-ALB: not found
Unknown options: -ELB_NAME
grep: _INSTANCES: invalid context length argument
could you enlighten us on how we should set the configuration for this?
Thank you
Hi Don,
I will check on this practically and update the same in blog.
Regards
Sharad
Hello,
nice post!
I’m trying to setup, but I’m getting a few errors:
# sh nagios_asg.sh
nagios_asg.sh: 14: nagios_asg.sh: _PROJECT-ELB_INSTANCES=/usr/local/nagios/etc/objects/Dev04/elb_instance: not found
nagios_asg.sh: 15: nagios_asg.sh: _PROJECT-NAGIOS_EXISTING_FILES=/usr/local/nagios/etc/objects/Dev04/nagios_existing_files: not found
nagios_asg.sh: 18: nagios_asg.sh: _PROJECT-ELB_NAME=MyApp-ALB: not found
Unknown options: -ELB_NAME
grep: _INSTANCES: invalid context length argument
Could you please help me what I’m doing wrong?
Thanks!
Hi Sharad i need some help
I need to setup nagios monitoring for the autoscaling group , there are multiple problems i am having are.
1. There is more than 1 autoscaling group for which i have to setup automated monitoring.
2. The instances which are added while autoscaling by script should also get removed at downscale.
3. The loadbalancer we are using is Application loadbalancer for which this command from your script does not work
aws –region $_AWS_REGION_NAME elb describe-load-balancers –load-balancer-names $_PROJECT-ELB_NAME –output text|grep INSTANCES|awk ‘{print $2}’ > $_PROJECT-ELB_INSTANCES
i tried changing elb keyword to elbv2 in the command but didn’t worked.
And i am unable to under stand which files you are referring to in this lines
_PROJECT-ELB_INSTANCES=elb_instance
_PROJECT-NAGIOS_EXISTING_FILES=nagios_existing_files
Please explain if possible
Thank you for your help in advance waiting for your reply.
H Rajesh,
I hope you understand the logic used here. Rest is the way to do by using command line.
I do not have this setup right now but if time allows will check. Expect the delay.
Regards
Sharad
Hi Sharad,
can I use this script in the local environment instead of AWS autoscaling?
What ‘ll be necessary changes required if I have to use this script on the local Nagios server? I want to connect Linux client to Nagios server without doing any changes to the server.
Hi Anish,
Yes you can use this script but you have to modify the lines wherever AWS command require.
If you could elaborate your scenario then I can suggest you in better way.
Regards
Sharad
This is great! Any ideas on how to restart the nagios service ONLY if a new machine spins up? Say I currently have machines 1 and 2 up, then machine 3 spins up. So I want it to leave the configs for 1 and 2 alone so that uptime is not affected, but add machine 3.
Hi Jake,
This script add new machine when it spin up.
Could you elaborate more and explain your requirement.
Regards
Sharad
Apologies I don’t think I was very clear. So if I have two machines, and one dies, auto scaling will spin up another machine. So now I want Nagios to keep machine one, and machine three, and remove machine two. It looks like your script handles all but the removal? How would you add that?
Hi Jake,
I think I understand your query now.Removal function is intentionally not added in this script.
It has strong reason – If anybody stop or remove the machine(EC2) by mistake/accidentally then nagios should alert. It is a part of security and auditing.
If still you want to opt for removal function then you have to modify this script.
You must have two inventory files to compare.
File A: It should have autoscaling inventory which is currently in use by Nagios
File B: Second file should have always updated autoscaling inventory.
Logic: When any changes found in File B then Update File A and this File A update Nagios Hosts for autoscaling.
If no changes found after comparing these 2 inventory files then no change in Nagios hosts for autoscaling.
I tried here with simple logic and using old school technique. In advanced tech then you can use some nosql Database which keep updated inventory…something like that.
I hope you understand the logic and you will take it forward by your own working style.
Regards
Sharad
Hi,
I think you should have declared “$_PROJECT-ELB_INSTANCES” variable instead of $MY_PROJECT in this awscli command below;
_PROJECT_PUBLICIP=$(aws –region $_AWS_REGION_NAME ec2 describe-instances –instance-ids $MY_PROJECT –query ‘Reservations[].Instances[].[PublicIpAddress]’ –output text)
Also i couldn’t see “$MY_PROJECT” variable where is declared ?
Hi Alican,
MY_PROJECT is declared in this line.
Regards
Sharad
Thank you very much Sharad,
You helped me prety much.
Here is my own script;
#!/bin/bash
CURRENT_DIR=/usr/local/nagios/etc/objects
ASG_NAMEs=`aws autoscaling describe-auto-scaling-groups –region eu-central-1 –query AutoScalingGroups[*].[AutoScalingGroupName] –output text`
cat /dev/null > $CURRENT_DIR/hostgroups/ASG-hostgroups.cfg
for ASG_NAME in $ASG_NAMEs
do
ASG_ELB=`aws autoscaling describe-auto-scaling-groups –auto-scaling-group-name $ASG_NAME –region eu-central-1 –query AutoScalingGroups[*].[LoadBalancerNames] –output text`
declare -a ELB_INSTANCE_IDS
ELB_INSTANCE_IDS=(`aws elb describe-instance-health –load-balancer-name $ASG_ELB –region eu-central-1 –query
‘InstanceStates[?State==\`InService\`].[InstanceId]’ –output text`)
echo ”
define hostgroup {
alias ASG-$ASG_NAME Hosts
hostgroup_name ASG-$ASG_NAME
}” >> $CURRENT_DIR/hostgroups/ASG-hostgroups.cfg
if [ ${#ELB_INSTANCE_IDS[@]} -gt 0 ]; then
PRIVATE_IPs=`aws ec2 describe-instances –instance-ids ${ELB_INSTANCE_IDS[@]} –query ‘Reservations[*].Instances[*].[PrivateIpAddress]’ –region eu-central-1 –output text`
if [ -f $CURRENT_DIR/hosts/ASG-$ASG_NAME.cfg ]; then
cat /dev/null > $CURRENT_DIR/hosts/ASG-$ASG_NAME.cfg
fi
for IP in $PRIVATE_IPs
do
echo ”
define host {
alias $ASG_NAME Host
use linux-server
host_name $ASG_NAME-$IP
address $IP
hostgroups ASG-$ASG_NAME
}” >> $CURRENT_DIR/hosts/ASG-$ASG_NAME.cfg
done
HOST_GROUP_NAMES+=”ASG-$ASG_NAME,”
fi
done
echo ”
define service {
hostgroup_name $HOST_GROUP_NAMES
service_description Root Partition Usage
check_command check_nrpe!-H \$HOSTADDRESS$ -c check_sda
use generic-service
}
define service {
hostgroup_name $HOST_GROUP_NAMES
service_description Memory Usage
check_command check_nrpe!-H \$HOSTADDRESS$ -c check_mem -t 60
use generic-service
}
define service {
hostgroup_name $HOST_GROUP_NAMES
service_description CPU-Avg
check_command check_nrpe!-H \$HOSTADDRESS$ -c check_cpu -t 60
use generic-service
}
define service {
hostgroup_name $HOST_GROUP_NAMES
service_description Uptime
check_command check_nrpe!-H \$HOSTADDRESS$ -p 5666 -t 60 -c check_uptime -a 0 0
use generic-service
}” > $CURRENT_DIR/services/ASG-services.cfg
service nagios restart
Welcome Alican!
You understand the logic pretty well. Thank you for sharing your script with our blog readers.
Regards
Sharad
You are wlcome. Thank you. I’m very appreciate.
Hi Sharad!
How are you?
First of all, great work on this script!
I’m trying to set it up, but I’m getting a few errors:
[root@nagios objects]# ./nagios_asg.sh
./nagios_asg.sh: line 14: _PROJECT-ELB_INSTANCES=elb_instance: command not found
./nagios_asg.sh: line 15: _PROJECT-NAGIOS_EXISTING_FILES=nagios_existing_files: command not found
./nagios_asg.sh: line 18: _PROJECT-ELB_NAME=testelb-2011504061.us-east-1.elb.amazonaws.com: command not found
Unknown options: -ELB_NAME
grep: _INSTANCES: invalid context length argument
Could you please tell me what I’m doing wrong?
Thanks a lot!
Hi Dioga,
Probably it is your system shell which is taking the variable as command.
elb_instace and nagios_existing_files are referred here as files.
Instead you can declare variable like this. Give absolute path where these file should be created.
_PROJECT-ELB_INSTANCES=/Give/Absolute/PATH/elb_instance
_PROJECT-NAGIOS_EXISTING_FILES=/Give/Absolute/PATH/nagios_existing_files
Regards
Sharad
Hi ,
Great work . But my auto scaling group policies include increasing and decreasing number of instances based of CPU load . So in case of decreasing the number of instances how will the nagios server detect it whether the instance is down or instance is terminated by autoscaling group
Hi Ruben,
Work on Cloud Trail log to get the info if server is stopped by user or autoscaling group.
Regards
Sharad
Very well explained.
How does this work for multiple AWS accounts?
Do we use multiple copies of the same script?
Or can we store the keys in a separate config file and call them separately for each individual AWS account?
Hello Sunil,
Thank you for feedback. I hope you understand the logic how this is achieved. Your question is very good and I have worked in such requirement like multiple AWS account API call.
You can make it either single or multiple script, it is up to your choice and managing script work. If you can make advanced bash script, given below is the logic.
If you are looking for Single script, here is the basic roadmap. Idea is just call the FUNCTION.
You can use, for…do loop or anything which is applicable.
1. Create one file and keep all AWS account ACCESS and SECRET key in serial order eg.
ACCOUNT1:AWS_KEY:SECRET_KEY
ACCOUNT2:AWS_KEY:SECRET_KEY
2. Create Function for this task
3. By using EXPORT (setting environment), you can set AWS Access key and Secret Key . also note at end never forget to unset the environment.
3. And in end , call the FUNCTION within loop.
Great thanks!
But inside autoscaling istances, what i need install to make a connection with Nagios server?
Hi Stefano,
You can install NRPE . By default it runs on port 5666 hence allow this port in security group so that your nagios server reach to nrpe.
Regards
Sharad