An inexpensive method of backing up an SMB to the cloud

First let’s get some assumptions out of the way.

  • This is geared toward a relatively well connected organization. What well connected means will greatly depend on the size of the organization but for a relatively small company with maybe a few hundred GB of storage that you want to backup online a business class cable connection will probably meet the bandwidth requirements but DSL or T1 will not. The quality (downtime etc.) is actually much less important than the raw amount of data you can upload.
  • I implemented a solution in an environment I previously worked in and it worked well. Watch the caveats I add throughout the document. They are from experience.

Why online?

Let me begin with the statement that I do not see this as a substitution for a traditional backup (I love tape) solution but an additional level of protection that is geographically separated from your main location.

The idea here is that you want to have a geographically separated backup store that will protect you from loss caused by a loss of your datacenter. You should already be moving your tapes offsite on a regular basis. Let’s assume you are sending out weekend backups on Monday morning. That still leaves as much as a week of data that can be lost if you have a catastrophe on Sunday night. It gets even worse if you have a policy that allows the most recent backup tapes to come back onsite for recovery of lost files.

Additionally you can eliminate most, if not all, needs to pull current tape sets back into the datacenter for recovery because a current backup will be available in your online backup set.

Let’s get started

Assuming you have accepted my premise that online backups are a good idea let’s get started with a simple solution.

Where to backup

You will first need to decide where to backup to. There are a few different options:

1)      If you are lucky enough to have a second facility you can replicate to this may be the cheapest option. Simply setup a cheap server with large drives and you are started. Most smaller companies that this document is geared toward will not have multiple datacenters or the connectivity to be able to use this option.

2)      Rent an inexpensive dedicated  server from a hosting company.  This does not, in my opinion, require the 5 9s of uptime that I expect out of my mew hosting company nor will I pay for the redundancy needed to make sure this is bullet proof. This is simply backups so if there is an hour here and there where the server is down that’s honestly not the end of the world.  Most important will be low cost data transit. Secondly a company that is willing to pull disk drives and send them next day courier could be an advantage in a disaster recovery if there is a large amount of data stored.  The initial upload will takes some time so you do want to avoid fly by night operations that may disappear after you finally finish that first long data upload.

3)      A cloud based solution such as S3 will provide good upload speeds at a reasonable price for relatively small amounts of data that do not change frequently. S3 has the added benefit of being geographically dispersed within the Amazon Cloud. Amazon claims that a loss of 2 data facilities will not cause a loss of your data availability. In my case S3 was too expensive for the amount of data we wanted to backup but it is a great option if you don’t have too much data that needs to be stored. I’d guess well into the 500GB+ range S3 would be hard to beat.  Based off of some rough numbers and Amazon’s S3 calculator someone wanting to store 500G and having about 100GB of file change per month would pay around $80/month.

How to do your backups

The software that I have used in the past is “Super Flexible File Synchronizer” (SFFS). There are other options available. SFFS can use FTP or S3’s built in upload methods. In my case since I used FTP. I did tests S3 and it worked well.  I will not get into the details of how SFFS works. The manufacturer’s documentation does a perfectly adequate job of that but, I will make some comments and suggests.

  • You will want to set some sort of retention period. I chose the last 3 versions of every file but some people will choose higher or lower numbers based on recovery needs and disk space costs.
  • You data is stored on the Internet. Certain data should just be skipped for this type of backups. Examples would include any personally identifiable data (PII) that is legally, regulatorially or contractually protected.  All other data should be encrypted with a key that is not stored with the backup data. Make sure you have this password/key.
  • SFFS used standard zip files for encrypted backups. I really liked this. If SFFS goes away I can still get to the data manually.
  • SFFS can do delta backups for non-encrypted data. This is great from a bandwidth standpoint if you have public data that can have the encryption left off. In my case data was mixed for the most part so I did not use the feature to comment on how well it works.
  • You will be shocked at just how many files get changed on a day to day basis when you look at the backup logs. This is exacerbated by the fact that encryption and delta backups will not work together.
  • If you are backing up to an FTP site it is your responsibility to protect the server. That is a story for another day.

That’s it

That is pretty much it. This method will assure that you have a geographically dispersed backup set.

Bonus Material

Adding a level of protection for FTP server backups

If you are using FTP servers for backups and especially if you are using low end server hosting companies for backup you may want to add an additional level of protection by replicating between two low cost providers. I have tested this and I know it works but I have not implemented this in practice.

The process should go something like this:

1)      Setup an account with a second service provider who operated data centers geographically separated from both your main data center and your first service provider.

2)      Setup software VPN between the two servers.

3)      Start replicating data from the original server to the new server.  Create a script that keeps this replica up to date.

Honestly this is not brain surgery. It just takes a little time to set up and doubles your cost of backups. This will push the breakeven between S3 and FTP servers even higher.

The picture below may help describe the data flow.

clooud backup

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Last updated 2/6/2011

Comments

tape back up over online storage

For my Linux I still prefer tape back up over online storage because I can be assured of the privacy and safety of my data. These commands are great if ever I want to get the tape status.