HOW TO - Back up your data using Rsync and SSH.

Jayce

Fully Optimized
Messages
3,056
Location
/home/jason
When I first began tinkering with this idea, the whole SSH thing kind of confused me, mostly because I didn't think SSH would be easy for an end user to utilize. While SSH is very complex in design, they've made it super easy for the end user to set up an authentication key set. Essentially, SSH is a 1 to 1 authenticated connection that can be obtained without a password. Once this is in place, you can utilize rsync to run automatically.

Before we begin, please ensure you have openssh-server installed on your file server in question.
Code:
sudo apt-get install openssh-server

Next, we need to set up a key pair. You will receive a public key and private key.
Code:
ssh-keygen

You will be asked some questions, such as whether or not you want a password to the key pair, etc. I chose no and basically left everything else default. I went with no password because SSH keys are pretty darn secure, and plus I wanted this to be automated. I was not sure how I could automate this process while still having a password on it.

The public key needs to get copied to the authorized_keys file on the server. Thanks to a handy command, this is painless. Replace jason@192.168.1.150 with what your setup would be.
Code:
ssh-copy-id jason@192.168.1.150

It'll ask you for your password. Put in your password to the user account you're authenticating against on the file server. Once done, you should be able to run:
Code:
ssh jason@192.168.1.150

If it did not ask for a password and your prompt changed, you're good to go. If it asked you for a password, something is likely off. Please note, if you mess around with the SSH keys (by deleting them, adding new ones, etc.) it'll require a reboot (some people have told me log out + log in works fine too) to reset. I don't know enough about that to explain what's happening besides taking the educated guess that the SSH key is getting locked to your session. Unless you plan to tinker around like I did, where I would delete the SSH keys and re-generate them over and over for learning purposes, you won't run into this issue. But if you do, I wanted to throw this out there.

So, SSH is set up and you're good to go. Now what? It's rsync's turn. You have opened the door with SSH, now you need to put it in gear with rsync. Rsync is a remote synchronization tool. For my uses, it's pretty much awesome. I suggest you folks read the rsync man page for more information. Just a side note, anybody reading this who uses Linux, please keep man pages in mind. They're quicker than Google. Honestly. You can read them up by going to terminal and typing "man rsync". Of course, you can substitute rsync for any other command to read more about it as well, aka "man cp" etc.

The man page will go over the functionality of a bunch of flags. There's a few I personally use and I'll cover them in my own words below.

-a Archive mode. This keeps the time, permissions, owner, group, and other various settings the same as the source. I like using -a because it ensures that my data on the file server match my data on the desktop, even down to who owns what and the time stamps.

-z Compression mode. I haven't really used this until recently. I'm not sure if I notice a difference because rsync is pretty fast to begin with, but I tack it in there, mostly because, why not?

--exclude= Exclude mode. This is if you want to exclude a specific directory, trash, videos, etc. For example, let's say you want to exclude ALL hidden files/folder... you would do --exclude=.* Notice after the equals sign there is a period and *? That ensures you're doing the wild card, meaning EVERYTHING, but only after the period. Since hidden files/folders are began with a period, you can see how it would include .folder1 .folder2 .folder3, etc.

Note - Personally, I would definitely recommend excluding .gvfs. .gvfs is the gnome virtual file system. It essentially acts as a mount point for network resources. Let's say your file server is accessible through .gvfs. If you rsync everything and don't exclude .gvfs, you're in essence duplicating the data on your file server that already exists, because it'll exist in its primary folder, as well as through .gvfs thanks to your file server.

/home/jason/Documents
/home/jason/Music
/home/jason/Pictures
/home/jason/.gvfs/Documents
/home/jason/.gvfs/Music
/home/jason/.gvfs/Pictures

By excluding .gvfs, you avoid this all together. If you're backing up a home directory, I'd suggest doing it. Using simply --exclude=.gvfs works for me, but if you need the full path, it would of course be --exclude=/home/jason/.gvfs

--delete This will delete files on the destination that don't exist on the source. Let's say you have a folder that contains 100 GB of data and it's simply named "data". If you rename it to "data2", your server would contain a copy of data and data2 @ a grand total of 200 GB. If you want the data on your server to be identical, use --delete. If you want to have some sort of "older file redundancy" (I know some people prefer this), don't use --delete.

--progress If you run rsync manually, you'll be able to see the progress of what's going on instead of just a flashing cursor. I only use this flag if I want to run the command manually and see what it's doing. I don't bother using this when it's "showtime" and I want it automated in the background.

Other than that, it's just about setting up the source and destination. Let's start with the destination, since after all, we're tinkering with SSH here so it's a tad bit different. For the destination, you'll need the user, server, and folder path. As I said, my name is Jason, and my file server is 192.168.1.150. My folder path on my server in particular is /media/NAS/jason. In my case, NAS is a network drive I shared out, so it's pretty specific to my situation. Yours is likely to differ. Tailor the destination to your own situation. If your "backup drive" is /media/storage and you have a folder on storage named frank, then use /media/storage/frank, etc. In my case:
Code:
jason@192.168.1.150:/media/NAS/jason
is my destination.

Now, about the sources. They're simple enough, as it's the same as above except it doesn't include user@server.

If you want your entire home directory to be synchronized, you can do so with just:
Code:
rsync -az /home/jason jason@192.168.1.150:/media/NAS/jason

If you want your entire home directory synchronized but with the exclusion of .gvfs and the --delete flag, use:
Code:
rsync -az --exclude=.gvfs --delete /home/jason jason@192.168.1.150:/media/NAS/jason

Getting the jist of it now?

Note, you can have multiple sources as well, which makes it handy if you only want to back up a few specific folders to your file server. In my case, I had limited file server space, so I only wanted to back up the most important data to my file server, which to me is Documents and Pictures. Example:
Code:
rsync -az --exclude=.gvfs --delete /home/jason/Pictures /home/jason/Documents jason@192.168.1.150:/media/NAS/jason

You can then set up a Cron job for this to run at specific times. I never run rsync as root, so when I set it up in Cron I set it to launch as jason and just tagged the above rsync command in.

I've since moved away from the Cron route. I shut down my computer at night, but my file server stays up all the time, I added an entry in "Startup Applications" to do the backup for me, which is handy because it runs at system startup. I named it NAS Backup and put the above command in the command field. Everything works like a charm with zero input needed from me. :guitar:

Quick tip, if you'd like to check out a decent rsync GUI, fire up grsync. It's easy to use and will help you structure out the rsync command if you're not entirely sure just yet. Just note, there is no --exclude= flag in the GUI, so you'll have to add it manually under Additional Options, but that's pretty darn easy to do. Grsync also doesn't use -a, but instead it breaks up -a to -t -o -p -g etc. Read the rsync man page under the -a section to see why this makes little/no difference.

Once you have it formulated the way you want, you can also do a test run, which is one of the features of grsync to make sure it works properly prior to giving it the green light. Assuming all is well and you're done, you can schedule this grsync job with, you guessed it, either Startup Applications or Cron. Keep in mind, the syntax for it is "grsync -e jobname". So if you named the job "backup", you'd run grsync -e backup. This would be the same for Cron or Startup Applications.

I tested it running it in Startup Applications. It comes up with a GUI window when I log in showing me the status of the data transfer. If I go the route with Startup Applications and just throwing the full rsync command in, it does it completely in the background.

Summary

The above was meant to be super informative. I hope some users can set up a backup system that works for them. Keep in mind, you never know when Mr. HardDrive is going to tank on you, so plan ahead. Below is a rough summary of what you're doing for the users who don't want to read through a mountain of text. Note: Change the below settings to match your setup, unless your name happens to be Jason and your file server happens to be 192.168.1.150.

Server
Code:
sudo apt-get install openssh-server

Client
Code:
ssh-keygen

Client
Code:
ssh-copy-id jason@192.168.1.150

Client
"Startup Applications" - Select New - Name it backup or whatever you please, and add desired rsync line in the command box, such as:
Code:
rsync -az --exclude=.gvfs --delete /home/jason jason@192.168.1.150:/media/NAS/jason

Suddenly it looks significantly easier when you cut out the informative text and just read the "get to the point" sections. I suggest everybody, regardless of your platform, make sure you have sufficient backups. I just dropped a backup server in my parent's basement to back up all of their data and my brothers data every day at 7 PM. I also duplicate my data to my file server as well. You just never know when Mr. HardDrive is going to tank on you...

Out of all of the backup platforms I've looked at, rsync is easily the best one out there. It's fast, thorough, customizable, and better yet, couple that with SSH-key authentication and you have a pretty much awesome setup.
 
Just to update, if you guys would like some sort of notification on the matter so you get a message indicating that the backup is complete, here's how you do it. You basically utilize a terminal command known as zenity which calls for a generic popup window specifying text within your script. Typically when I run my backup, I do so with rsync without compacting my data so even when my desktop is off (which is whenever I'm not sitting in front of it) I can still pull off relevant pictures, music, etc if need be from the file server.

Basically I created a file in /usr/local/bin. I'm not sure why I chose that directory, but long long ago once upon a time a Linux guru made the suggestion that I throw my homemade scripts in there. Since it was an empty directory, I chose to do so and never looked back. I keep the script owned as root for security reasons. Think about it. If your user owns the script and you leave your computer unlocked, or you magically get hax0red, then whoever had access to your computer could put an rm -rf flag in there without root override and before you know it, the script crashes your system. Leave the script owned as root, but leave it with executable rights (chmod +x).

My script reads as follows:

#!/bin/bash
rsync -az --delete /home/jason/Pictures /home/jason/Documents /home/jason/Desktop /home/jason/"Ubuntu One" /home/jason/Dropbox /home/jason/kdenlive jason@192.168.1.150:/media/NAS/jason/DesktopUbuntu
sleep 1
zenity --info --title="Backup" --text="Backup Complete"
sleep 1
exit

Reason I have specific entries listed to back up and not my entire home directory (excluding .gvfs of course) is simply because my file server is only running a 500GB array while my desktop is a 1TB array. I back up what I absolutely need and that's it. Someday when I pick up a series of 2TB HDD's and I'm running a 6-8TB array it'll be a different story.

The zenity command is quite simple. Zenity of course is the core of the command, and --info calls for an "informational" prompt. There's no buttons for yes/no here, just a simple okay to let the user know what happened. Beyond that, if you take a look at this screenshot you can see how the rest of it ties in.

KfuIF.png


Of course, you can always assign a key bind to your command. I did, using CTRL ALT B. Not only do I have this set to run when I log in, but I have this key command here just in case I ever want to run it manually.

3x6SL.png


Here's the entry that's set to run upon login:

PMOBG.png


Since the screenshot sort of cuts off the "command" portions, here's the full syntax:

/usr/local/bin/./backup

It's the full path of the script, keeping in mind that to execute the script, you need to put ./ before it.

Happy backups!
 
Very interesting update, Jayce! If only I could spit that out. I would spit it out then it'd come back and burn me. :lol:

I'll have to look into that.
 
Fun fact - The Ubuntu documentation team PMed me on UbuntuForums earlier. They said they would like my how to guide I posted (exact same one here, except I also posted it there) to be part of the official Rsync documentation for Ubuntu. Woop!
 
I wish I knew how to code more so I could get into the nitty gritty bug fixing, but it sounds like user guides are a decent sized void as well... probably why I write so many when I spend enough time with a specific project. Can't help but to take something like that as a compliment, so I'm pretty okay with that. All the more reason to spend time doing more, no? ;)
 
Fun fact - The Ubuntu documentation team PMed me on UbuntuForums earlier. They said they would like my how to guide I posted (exact same one here, except I also posted it there) to be part of the official Rsync documentation for Ubuntu. Woop!

That's fantastic news! Really goes to show the effort and work you have put back into the community.
 
Here's some fun additions. I started to wonder, hmm... that's great I get notified that the backup ran, but what if I want a continual backup log? I decided to see if there's a way to input the date over the network and dump it into a text file on the server in question. Sure enough, there was. Here's my script:

#!/bin/bash
set -e
rsync -az --delete /home/jason/Pictures /home/jason/Documents /home/jason/Desktop /home/jason/"Ubuntu One" /home/jason/Dropbox /home/jason/kdenlive jason@192.168.1.150:/media/NAS/jason/DesktopUbuntu
ssh jason@192.168.1.150 'date >> /media/NAS/backup_logs/DesktopUbuntu.txt'
exit

So of course you have the long rsync line with all of the items I want to back up. I have to be selective about what I back up because I only have 500GB in my server, but I have 1TB on my desktop. Because of that I only back up what's critical to me. I don't care who you are, backing up your cloud storage makes complete sense. Kdenlive is a movie making application, and I'm in the middle of a 15 minute video I'm putting together to present at my wedding next month, so I sure don't want to lose that. Everything else like Documents, etc is self explanatory.

The new line though is the SSH line, which reads:
ssh jason@192.168.1.150 'date >> /media/NAS/backup_logs/DesktopUbuntu.txt'

Because my SSH keys are set up, it allows me to execute commands on the remote server automatically. "date" in terminal shows the date. The >> means you want to export the output of "date" to a specific location. In my case, I want /media/NAS/backup_logs/DesktopUbuntu.txt to be that location. I ended up adding backup_logs to the Samba shares, which allows me to log into my file server and see when my systems last backed up. Super handy.

The set -e line, from what I understand, enforces that the rest of the script won't complete if there's an error. So what's that mean in plain English? If any lines above it fail, the date command won't populate itself in the text file. Meaning if your backup fails, it won't add a new entry in the text file. I intentionally tried to rsync to the wrong IP to test it and sure enough, it never populated. Put the correct IP back in and bingo - a new entry popped up.

If you want to get crazy, you can set up the zenity options within the same script so you still get the notification. Or if you want a more silent backup but want a paper trail so you can see the history, the date entry is a winner. Just don't forget the set -e portion. It'd be a real bummer to find out your backup system has been failing for the last 5 months and falsely telling you it was completing.
 
I usually use rsync command to synchronize files between two servers. To do this we use rsync -avP /home/* root@new_server:/home/
 
Back
Top Bottom