In this post we are sharing the Cassandra backup script on linux system. Backup is always important part of any system. Backup will help you to get the data back in case data is removed due to some reason.
The script includes two section as per the Cassandra backup requirement. These are given below
1. SCHEMA Backup
2. Taking SNAPSHOT Backup
NOTE: The script you have to run in each cassandra node server.
You can run this script manually as well set in cronjob to make it run periodically.
You can download the backup script from my github link.
Create Cassandra Backup Script :
1. Create the file called cassandra_backup.sh .
2. Make it executable chmod +x cassandra_backup.sh
3. And copy paste below given content in file cassandra_backup.sh.
#!/bin/bash # # Author: Sharad Kumar Chhetri # Date : 27-April-2015 # Description : The backup script will complete the backup in 2 phases - # 1. First Phase: Taking backup of Keyspace SCHEMA # 2. Seconf Phase: Taking snapshot of keyspaces # ## In below given variables - require information to be feed by system admin## # For _NODETOOL , you can replace $(which nodetool) with absolute path of nodetool command. # _BACKUP_DIR=/backup _DATA_DIR=/var/lib/cassandra/data _NODETOOL=$(which nodetool) ## Do not edit below given variable ## _TODAY_DATE=$(date +%F) _BACKUP_SNAPSHOT_DIR="$_BACKUP_DIR/$_TODAY_DATE/SNAPSHOTS" _BACKUP_SCHEMA_DIR="$_BACKUP_DIR/$_TODAY_DATE/SCHEMA" _SNAPSHOT_DIR=$(find $_DATA_DIR -type d -name snapshots) _SNAPSHOT_NAME=snp-$(date +%F-%H%M-%S) _DATE_SCHEMA=$(date +%F-%H%M-%S) ###### Create / check backup Directory #### if [ -d "$_BACKUP_SCHEMA_DIR" ] then echo "$_BACKUP_SCHEMA_DIR already exist" else mkdir -p "$_BACKUP_SCHEMA_DIR" fi if [ -d "$_BACKUP_SNAPSHOT_DIR" ] then echo "$_BACKUP_SNAPSHOT_DIR already exist" else mkdir -p "$_BACKUP_SNAPSHOT_DIR" fi ##################### SECTION 1 : SCHEMA BACKUP ############################################ ## List All Keyspaces cqlsh -e "DESC KEYSPACES" |perl -pe 's/e([^[]]|[.*?[a-zA-Z]|].*?a)//g' | sed '/^$/d' > Keyspace_name_schema.cql #_KEYSPACE_NAME=$(cat Keyspace_name_schema.cql) ## Create directory inside backup SCHEMA directory. As per keyspace name. for i in $(cat Keyspace_name_schema.cql) do if [ -d $i ] then echo "$i directory exist" else mkdir -p $_BACKUP_SCHEMA_DIR/$i fi done ## Take SCHEMA Backup - All Keyspace and All tables for VAR_KEYSPACE in $(cat Keyspace_name_schema.cql) do cqlsh -e "DESC KEYSPACE $VAR_KEYSPACE" > "$_BACKUP_SCHEMA_DIR/$VAR_KEYSPACE/$VAR_KEYSPACE"_schema-"$_DATE_SCHEMA".cql done ##################### END OF LINE ---- SECTION 1 : SCHEMA BACKUP ##################### ###### Create snapshots for all keyspaces echo "creating snapshots for all keyspaces ....." $_NODETOOL snapshot -t $_SNAPSHOT_NAME ###### Get Snapshot directory path _SNAPSHOT_DIR_LIST=`find $_DATA_DIR -type d -name snapshots|awk '{gsub("'$_DATA_DIR'", "");print}' > snapshot_dir_list` #echo $_SNAPSHOT_DIR_LIST > snapshot_dir_list ## Create directory inside backup directory. As per keyspace name. for i in `cat snapshot_dir_list` do if [ -d $_BACKUP_SNAPSHOT_DIR/$i ] then echo "$i directory exist" else mkdir -p $_BACKUP_SNAPSHOT_DIR/$i echo $i Directory is created fi done ### Copy default Snapshot dir to backup dir find $_DATA_DIR -type d -name $_SNAPSHOT_NAME > snp_dir_list for SNP_VAR in `cat snp_dir_list`; do ## Triming _DATA_DIR _SNP_PATH_TRIM=`echo $SNP_VAR|awk '{gsub("'$_DATA_DIR'", "");print}'` cp -prvf "$SNP_VAR" "$_BACKUP_SNAPSHOT_DIR$_SNP_PATH_TRIM"; done
Your question and comments will help me to improve the script.
Wonderful script. Saved my day. Thanks.
I have added below for some enhancements
###### Deleting old snapshots for all keyspaces
$_NODETOOL clearsnapshot
Below is my restore script for schema
@Harpreet
Hi
is it possible for you to share your backup and restore script files here for my reference?
Respect to website author , some wonderful entropy.
Thank You for the wonderful script.
Thanks, the scripts saved a lot time
If backup without system related keyspaces needed to be taken
use
For schema at line 48 in github link
cqlsh -e “DESC KEYSPACES” |perl -pe ‘s/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)//g’ | sed ‘/^$/d’ | sed -e s/system_schema//g | sed -e s/system//g | sed -e s/_traces//g | sed -e s/_auth//g | sed -e s/_distributed//g > Keyspace_name_schema.cql
For snapshots
_SNAPSHOT_DIR_LIST=`find $_DATA_DIR -type d -name snapshots|awk ‘{gsub(“‘$_DATA_DIR'”, “”);print}’ | grep -v “system” > snapshot_dir_list`
At line 77 in github link
The perl/sed commands appeared to truncate some of the keyspace names. If anyone runs into trouble, try this alternative:
cqlsh -e “DESC KEYSPACES” | awk ‘{print $1 “\n” $2 “\n” $3 “\n”}’ | grep -v “^$” | sort > Keyspace_name_schema.cql
Thanks for commenting!
Surely it will help the readers.
Regards
Sharad
Great script, I was also able to make some very small changes and use in my cassandra env, but I am have issues resorting the backup, do you have a script to restore as well or do you have any advice
Hi,
I have not created restore Cassandra backup script for blog purpose. This script is taking backup of two things –
1. KEYSPACE SCHEMA (As a cql file using cqlsh command)
2. KEYSPACE DATA (As a snapshot)
To Restore SCHEMA –
1. Copy SCHEMA BACKUP file to current login user’s home directory (i,e KEYSPACES SCHEMA cql file)
2. cd ~
3. Login to cqlsh console by hitting
cqlsh
command.4. In cqlsh console , by using source command call KEYSPACE SCHEMA backup file name.
In above replace KEYSPACE_SCHEMA_BACKUP.cql with the cassandra schema backup file name which you have given.
To Restore SNAPSHOT :
1. Copy KEYSPACE snapshot backup directory to relevant Cassandra KEYSPACE Data directory.
I hope you are aware where is CASSANDRA Data dir located in your system. In general it is located at /var/lib/cassandra/data/ .
Try in your test machine first.
Regards
Sharad
Hi,
Very helpful script, but littlebit confused which directory I need to copy.. do I need to copy content of SNAPSHOT directory or when backup script ran — The following structure has been created BACKUP_FOLDER –> FOLDERS called SCHEMA and SNAPSHOTS –> with in this SNAPSHOTS –> Keyspace Folder –> Tablename with uuid folder –> with in this again a folder called snapshots –> within this I see multiple folders with snp datexxx based on different timesamps as i scheduled it on cron for every 3 hours . Now my question is wherther i need to copy this snpdatexxx directory –> /var/lib/cassandra/data/anlaytic folder.. (where as anlaytic is my keyspace name)
Hi Prasanth,
I think this document will clear your doubts. Just do the testing things practically, I hope your clear doubt will be cleared and gain some additional knowledge.
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupSnapshotRestore.html
Regards
Sharad
With minor modifications, I am able to use it in a clustered env. Thank you very much!
Thank You Chandra,
Appreciate for feedback. I am happy to see you understand the logic of script and did modification as per your environment.
Regards
Sharad
the script is nice. I have cassandra1.2 where cqlsh command uses -f option. it will not accept -e option with foreground commands. let me know if you have script that works for earlier versiosn of Cassandra. thanks
Hello Friend,
You can do changes in script, as I know it has two commands cqlsh and cassandra-cli in old version. Check once if any command works for you.
I have not created backup script for old version of Cassandra.
Regards
Sharad