Installation


Upgrades


DTC-Xen Installation


DTC-Xen / Dom0 Howtos

DTC-Xen / DomU Howtos

FAQ


DTC Howtos


Manuals


Features


Roadmap


Devel docs


Wiki - i18n


Wiki - Meta


Incremental rsync backup with dirvish and automysqlbackup

0. Naming conventions for this tutorial

We'll called the server holding the backup "back.example.com" or just "back" and the one that is in production and that you wish to save files "prod.example.com" or just "prod". Something like this:

   prod# echo test

shows that you need to type "echo test" on your production server.

Note that the below howto is for Debian, there might be some differences in another OS.

1. Why automysqlbackup and dirvish?

Using FTP for doing a backup works well, and DTC helps doing it by generating a backup script that is started every day/week/month. While this works pretty well, and is pretty easy to use, it always backup things that have already been saved, and it doesn't keep a history (this is not an incremental backup).

Because of this reason, we recommend advanced users to use dirvish and automysqlbackup instead, because the resulting backup has few days of history for all files, and a daily, weekly and monthly backup for all your database. It also performs a way better (no need to do CPU intensive tasks like creating tar.gz archives), saves a lot of bandwidth for things that have already been sent to the backup server, and is very secure (it's using ssh). The below setup is also the perfect solution to backup a server that uses DTC.

2. Setting-up automysqlbackup on your server

Using rsync for doing a backup works pretty well for everything BUT MySQL database. This is why you should use automysqlbackup. This software is the "install and forget" kind of, and it is extremely easy to use (there's absolutely nothing to configure).

Because we loved the simplicity of automysqlbackup, GPLHost created a Debian package for it. It is available only in Squeeze and SID (it's not in Lenny), so if you use Lenny, you might want to add one of the mirrors of GPLHost (which you must have already done if you use DTC).

To install automysqlbackup, just do:

    prod# apt-get install automysqlbackup

By default it will save all of your MySQL database in /var/lib/automysqlbackup. If you want to change this, for example if you want the backup to be in /var/www so it will later be saved with the rest of the files in /var/www/sites, you can edit /etc/default/automysqlbackup.

Note that by default, automysqlbackup will generate one backup every day for the last week, one backup every week for the last month, and one backup every month for the last year. I appears to me that this is really enough for most situation, unless you need dumps every hour (in which case you can always change the frequency to start automysqlbackup, and that will be it...).

3. What is the principle of Dirivsh

Dirvish uses rsync to do it's backup. On the backup server, you install dirivsh and rsync, and then you will only need to have rsync installed on the server you wish to backup. I stress it again: DO NOT install dirvish on the server you wish to have a backup, this is not needed.

Dirivsh uses "vaults" where files are stored. These are in fact just backup repositories containing both the configuration/definition of the target server you want to keep a backup of, and the files themselves.

4. Installing software and creating the dirvish user on the production server

First of all, install ssh, rsync and sudo:

   prod# apt-get install ssh sudo rsync

Then you need to create a username so that the backup server can login and use rsync. We always use "dirvish" as username, but you could use virtually anything. Then you have to configure /etc/sudoers to allow the "dirvish" user to use rsync as root. Here's how:

   prod# adduser --disabled-password --gecos "Dirvish backup user" dirvish
   prod# echo "dirvish ALL = NOPASSWD: /usr/bin/rsync" >>/etc/sudoers

We setup the user with --disabled-password because we will use ssh keys only for the backup server to log into the production one.

5. Setting-up ssh keys

As root user on the backup server, create a pair of private and public ssh keys:

   back# ssh-keygen -t rsa

When ask for the path, just type enter. When asked for a passphrase, enter an empty passphrase (just type enter), because nobody will be on the keyboard to type it. Do not worry, this is still very safe. Next, you have to copy the public key of the backup server into the production server. Just read the public key on the backup server with:

   back# cat /root/.ssh/id_rsa.pub

Set the ssh files and user into the Production server. Something like this:

   prod# mkdir /home/dirvish/.ssh
   prod# chown dirvish.dirvish /home/dirvish/.ssh
   prod# chmod 700 /home/dirvish/.ssh
   prod# echo "ssh-rsa <your-ssh-key>" >/home/dirvish/.ssh/authorized_keys2
   prod# chown dirvish.dirvish /home/dirvish/.ssh/authorized_keys2
   prod# chmod 600 /home/dirvish/.ssh/authorized_keys2

Copy the content of the ssh public key (see above) and paste it into your newly created file "authorized_keys2"

Note that copying the email address at the end of the key is often a bad idea, because if the rDNS of the backup is wrong, your production server will not let you in. Next, simply try, from the backup server and being root, to login as the dirvish user in your production server:

   back# ssh dirvish@prod.example.com

It shall print a warning about the fingerprint of the sshd of your production server. Just type "yes", and normally, you'll get in without typing a password. Then you should be connected in your Production server from the Backup server as user "dirvish".

6. Setting-up Dirvish on your backup server

If you haven't done it yet it is time to install dirvish:

agt-get install dirvish

First, you got to write the main configuration file of dirvish. Here's an example of what should be in /etc/dirvish/master.conf in your backup server:

   bank:
           /var/backup
   index:  text

   exclude:
           lost+found/
           /proc
           /sys
           /etc/mtab
           /var/cache/apt
           core
           *~
           .nfs*
           /tmp/*
           subdomains.aufs

   Runall:
           prod.example.com
   image-default: %Y%m%d
   expire-default: +9 days
   rsh: ssh -o "BatchMode yes" -o "StrictHostKeyChecking no"

I believe that the above is self explanatory, but here's few comments on it still. The "bank:" directive tells where your will store your backup files. "Runall:" lists all the folders in /var/backup that are "vaults". Meaning that here, /var/backup/prod.example.com will be where your production server files will be saved. Of course, in "Runall:" you can list more than a single vault. Now, you got to configure your vault. First create the folder:

   back# mkdir -p /var/backup/prod.example.com/dirvish

Here's an example of configuration file. In this example, it will be stored in /var/backup/prod.example.com/dirvish/default.conf:

   rsync-client: sudo rsync
   client: dirvish@prod.example.com
   tree: /var/www

This tells what user to log as, and what command to use for rsync.

Notice the tree is set to / and then the exclude excludes everything but the folders with + commands. This is because the tree option doesn't allow multiple folders per vault. The + means to include the folder. You must specify the includes before the excludes. This is a work-around to allow you to back up all your DTC settings in to one back up.

If you have commands to run before of after the backup, you can use pre-client, post-client, pre-server and post-server commands. pre-client will run on the production server, while pre-server will run on the backup server. This could be done like this, in the default.conf:

   pre-client: /etc/init.d/mysql stop
   post-client: /etc/init.d/mysql stop

Note that this is just an example, of course you don't really want to shutdown your MySQL server during the backup operation that could take a while...

You might want to use the pre and post client as another way to backup your whole mysql database.

   pre-client: /etc/dirvish/sqldump-zip.sh


   #!/bin/sh

   #Remove the last backup
   rm -f /var/www/MyDatabaseBackup.zip

   #Dump the entire database
   /usr/bin/mysqldump -A -a -e -pMyPassword >  /var/www/MyDatabaseBackup.sql

   #Compress the sql file (saves about 80% of space)
   /usr/bin/zip /var/www/MyDatabaseBackup.zip /var/www/MyDatabaseBackup.sql

   #Remove the redundant sql file
   rm -f /var/www/MyDatabaseBackup.sql

Notice that putting the zip file in /var/www will cause it to be backed up when your back up runs.

7. Running the first backup by hand

Dirvish requires you to do the first backup "by hand". This is very easy, here's an example:

   back# dirvish --vault prod.example.com --init

Note that because the first backup will take a long long time, it is highly recommended to use "screen" and do this dirvish 1st backup in a screen session to avoid being disconnected from ssh while doing the backup.

That is it, everything is ready. From now on, your backup server will login every day in your production server to do a backup using rsync over ssh. There's nothing more you have to do for this setup! But what about the SQL dbs we talked about before? Well, if you configured automysqlbackup to do dumps in /var/www/automysqlbackup like I suggested earlier, it will be included in the dirivsh backup as well, so no worries here.

8. Setting up a CRON job

If you installed Dirvish using Debian Apt-Get then a cron job has been created for you automatically.

A cron job should call:

   /usr/sbin/dirvish-expire --quiet && /usr/sbin/dirvish-runall --quiet

Note, if you installed Dirvish using the install script in the tar file (from the Dirvish website) then the dirvish executables may be located in /usr/local/sbin.

Make sure that /usr/local/sbin is in the path when you run the cron job. dirvish-runall calls dirvish from the path. If dirvish-runall can't find dirvish your job won't be run but no error message is produced.

A suggested script to run both commands and set the path might be:

   #!/bin/sh
   PATH=$PATH:/usr/local/sbin/
   /usr/local/sbin/dirvish-expire --quiet
   /usr/local/sbin/dirvish-runall --quiet

To run the job every night you might use:

   4 22 * * *     root     /etc/dirvish/dirvish-cronjob

9. Restoring from the backup

Having a backup but not knowing how to restore is stupid, right? Well, with drivish, it's extremely simple. Let me show you:

   back# ls /var/backup/prod.example.com
   20100530  20100601  20100603  20100605  20100607  dirvish
   20100531  20100602  20100604  20100606  20100608

Each day, you will have a folder as above, with the date as timestamp. Then in each folder, you have something like this:

   back# ls /var/backup/prod.example.com/20100608
   index  log  summary  tree

Above, only "tree" is a folder, the others are just files you can ignore. In /var/backup/prod.example.com/20100608/tree, you have a complete backup of all files. If a file was the same as the backup for each previous day (eg: it's been 10 days the file is there), then the file will in fact be a hardlink. Meaning that this file takes the storage space ONCE for each of the 9 days of history, even though it appears to be in each individual daily backup, as if it was a normal file.

10. Last word

This howto didn't cover the backup of /etc/mlmmj that you would need to backup as well if you run DTC.

Editing this page means accepting its license.

Page last modified on September 03, 2012, at 07:36 AM EST