Cassandra backup script on linux system

In this post we are sharing the Cassandra backup script on linux system. Backup is always important part of any system. Backup will help you to get the data back in case data is removed due to some reason.

The script includes two section as per the Cassandra backup requirement. These are given below

1. SCHEMA Backup
2. Taking SNAPSHOT Backup

NOTE: The script you have to run in each cassandra node server.
You can run this script manually as well set in cronjob to make it run periodically.

You can download the backup script from my github link.

Create Cassandra Backup Script :

1. Create the file called cassandra_backup.sh .
2. Make it executable chmod +x cassandra_backup.sh
3. And copy paste below given content in file cassandra_backup.sh.

#!/bin/bash
#
# Author: Sharad Kumar Chhetri
# Date : 27-April-2015
# Description : The backup script will complete the backup in 2 phases -
#  1. First Phase: Taking backup of Keyspace SCHEMA
#  2. Seconf Phase: Taking snapshot of keyspaces
#

## In below given variables - require information to be feed by system admin##
# For _NODETOOL , you can replace $(which nodetool) with  absolute path of nodetool command.
#

_BACKUP_DIR=/backup
_DATA_DIR=/var/lib/cassandra/data
_NODETOOL=$(which nodetool)

## Do not edit below given variable ##

_TODAY_DATE=$(date +%F)
_BACKUP_SNAPSHOT_DIR="$_BACKUP_DIR/$_TODAY_DATE/SNAPSHOTS"
_BACKUP_SCHEMA_DIR="$_BACKUP_DIR/$_TODAY_DATE/SCHEMA"
_SNAPSHOT_DIR=$(find $_DATA_DIR -type d -name snapshots)
_SNAPSHOT_NAME=snp-$(date +%F-%H%M-%S)
_DATE_SCHEMA=$(date +%F-%H%M-%S)

###### Create / check backup Directory ####

if [ -d  "$_BACKUP_SCHEMA_DIR" ]
then
echo "$_BACKUP_SCHEMA_DIR already exist"
else
mkdir -p "$_BACKUP_SCHEMA_DIR"
fi

if [ -d  "$_BACKUP_SNAPSHOT_DIR" ]
then
echo "$_BACKUP_SNAPSHOT_DIR already exist"
else
mkdir -p "$_BACKUP_SNAPSHOT_DIR"
fi



##################### SECTION 1 : SCHEMA BACKUP ############################################ 

## List All Keyspaces
cqlsh -e "DESC KEYSPACES" |perl -pe 's/e([^[]]|[.*?[a-zA-Z]|].*?a)//g' | sed '/^$/d' > Keyspace_name_schema.cql

#_KEYSPACE_NAME=$(cat Keyspace_name_schema.cql)

## Create directory inside backup SCHEMA directory. As per keyspace name.
for i in $(cat Keyspace_name_schema.cql)
do
if [ -d $i ]
then
echo "$i directory exist"
else
mkdir -p $_BACKUP_SCHEMA_DIR/$i
fi
done

## Take SCHEMA Backup - All Keyspace and All tables
for VAR_KEYSPACE in $(cat Keyspace_name_schema.cql)
do
cqlsh -e "DESC KEYSPACE  $VAR_KEYSPACE" > "$_BACKUP_SCHEMA_DIR/$VAR_KEYSPACE/$VAR_KEYSPACE"_schema-"$_DATE_SCHEMA".cql 
done


##################### END OF LINE ---- SECTION 1 : SCHEMA BACKUP #####################

###### Create snapshots for all keyspaces
echo "creating snapshots for all keyspaces ....."
$_NODETOOL snapshot -t $_SNAPSHOT_NAME

###### Get Snapshot directory path
_SNAPSHOT_DIR_LIST=`find $_DATA_DIR -type d -name snapshots|awk '{gsub("'$_DATA_DIR'", "");print}' > snapshot_dir_list`

#echo $_SNAPSHOT_DIR_LIST > snapshot_dir_list

## Create directory inside backup directory. As per keyspace name.
for i in `cat snapshot_dir_list`
do
if [ -d $_BACKUP_SNAPSHOT_DIR/$i ]
then
echo "$i directory exist"
else
mkdir -p $_BACKUP_SNAPSHOT_DIR/$i
echo $i Directory is created
fi
done

### Copy default Snapshot dir to backup dir

find $_DATA_DIR -type d -name $_SNAPSHOT_NAME > snp_dir_list

for SNP_VAR in `cat snp_dir_list`;
do
## Triming _DATA_DIR
_SNP_PATH_TRIM=`echo $SNP_VAR|awk '{gsub("'$_DATA_DIR'", "");print}'`

cp -prvf "$SNP_VAR" "$_BACKUP_SNAPSHOT_DIR$_SNP_PATH_TRIM";

done

Your question and comments will help me to improve the script.

15 thoughts on “Cassandra backup script on linux system”

  1. Wonderful script. Saved my day. Thanks.

    I have added below for some enhancements
    ###### Deleting old snapshots for all keyspaces
    $_NODETOOL clearsnapshot

    Below is my restore script for schema

    #!/bin/bash
    # Date : SEP 2018
    # Author: Harpreet
    # Description : Restore Keyspace SCHEMA
    
    
    _BACKUP_DIR=/mnt/hdd/cass_backup
    _BACKUP_SCHEMA_DIR="$_BACKUP_DIR/SCHEMA"
    
    
    cd $_BACKUP_SCHEMA_DIR
    
    for i in $(cat Keyspace_name_schema.cql | sed  's/system_schema//' | sed  's/system_auth//' | sed  's/system_distributed//' |sed  's/system_traces//' | sed 's/system//')
    do
    if [ -d $i ]
    then
    cqlsh -f $i/*
    else
    echo "$i schema does not exist"
    fi
    done
    
    • @Harpreet

      Hi

      is it possible for you to share your backup and restore script files here for my reference?

  2. Thanks, the scripts saved a lot time

    If backup without system related keyspaces needed to be taken
    use
    For schema at line 48 in github link
    cqlsh -e “DESC KEYSPACES” |perl -pe ‘s/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)//g’ | sed ‘/^$/d’ | sed -e s/system_schema//g | sed -e s/system//g | sed -e s/_traces//g | sed -e s/_auth//g | sed -e s/_distributed//g > Keyspace_name_schema.cql

    For snapshots
    _SNAPSHOT_DIR_LIST=`find $_DATA_DIR -type d -name snapshots|awk ‘{gsub(“‘$_DATA_DIR'”, “”);print}’ | grep -v “system” > snapshot_dir_list`

    At line 77 in github link

  3. The perl/sed commands appeared to truncate some of the keyspace names. If anyone runs into trouble, try this alternative:

    cqlsh -e “DESC KEYSPACES” | awk ‘{print $1 “\n” $2 “\n” $3 “\n”}’ | grep -v “^$” | sort > Keyspace_name_schema.cql

  4. Great script, I was also able to make some very small changes and use in my cassandra env, but I am have issues resorting the backup, do you have a script to restore as well or do you have any advice

    • Hi,

      I have not created restore Cassandra backup script for blog purpose. This script is taking backup of two things –
      1. KEYSPACE SCHEMA (As a cql file using cqlsh command)
      2. KEYSPACE DATA (As a snapshot)

      To Restore SCHEMA –

      1. Copy SCHEMA BACKUP file to current login user’s home directory (i,e KEYSPACES SCHEMA cql file)
      2. cd ~
      3. Login to cqlsh console by hitting cqlsh command.
      4. In cqlsh console , by using source command call KEYSPACE SCHEMA backup file name.

      cqlsh> source KEYSPACE_SCHEMA_BACKUP.cql
      

      In above replace KEYSPACE_SCHEMA_BACKUP.cql with the cassandra schema backup file name which you have given.

      To Restore SNAPSHOT :
      1. Copy KEYSPACE snapshot backup directory to relevant Cassandra KEYSPACE Data directory.
      I hope you are aware where is CASSANDRA Data dir located in your system. In general it is located at /var/lib/cassandra/data/ .

      Try in your test machine first.

      Regards
      Sharad

      • Hi,

        Very helpful script, but littlebit confused which directory I need to copy.. do I need to copy content of SNAPSHOT directory or when backup script ran — The following structure has been created BACKUP_FOLDER –> FOLDERS called SCHEMA and SNAPSHOTS –> with in this SNAPSHOTS –> Keyspace Folder –> Tablename with uuid folder –> with in this again a folder called snapshots –> within this I see multiple folders with snp datexxx based on different timesamps as i scheduled it on cron for every 3 hours . Now my question is wherther i need to copy this snpdatexxx directory –> /var/lib/cassandra/data/anlaytic folder.. (where as anlaytic is my keyspace name)

  5. the script is nice. I have cassandra1.2 where cqlsh command uses -f option. it will not accept -e option with foreground commands. let me know if you have script that works for earlier versiosn of Cassandra. thanks

    • Hello Friend,

      You can do changes in script, as I know it has two commands cqlsh and cassandra-cli in old version. Check once if any command works for you.
      I have not created backup script for old version of Cassandra.

      Regards
      Sharad

Comments are closed.