Faster Cloud Backups for Large Files: How It Works
Speed is a crucial aspect of backups – ultimately, it’s the determining factor in how much data you can back up daily. If you have 400GB of data that needs to be backed up each day, and your backup solution can only manage to transfer 200GB within your backup window, you’ve got a problem (or rather, you DON’T have daily backups).
We’ve been discussing the importance of speed and backup windows a lot, as we’ve shared the results of recent speed tests pitting Zetta DataProtect against MozyPro and a Barracuda Backup appliance. The tests measured performance for the 4.7 release of Zetta DataProtect, which contained significant speed improvements for large files.
“But Zetta,” you may be thinking, “haven’t you ALWAYS claimed to be fast?” We have, and we have been! But that doesn’t mean there wasn’t still room to get even faster. For very large files in particular – like, say, server images – we were able make a dramatic improvement with Version 4.7. Here’s how.
Four Steps to Cloud Backup
First, let’s rewind a bit and talk about what happens when we make an incremental cloud backup. The focus here is on incremental rather than initial backups because as we’ve discussed, it’s the incremental backup speed that’s actually more important, as this is the backup speed customers will have to live with.
When the Zetta service performs an incremental backup, it starts by checking the server’s dataset for changes since the previous sync. It then runs calculations on the CPU to determine what data needs to be transferred. Once the changed data is determined, it’s moved over the Internet and written to the Zetta backend.
The catch was, these four steps were performed sequentially. Before the CPU calculations could start, the dataset checking had to end. The CPU calculations then had to end before the transfer could start. You get the picture. For datasets that were mostly smaller files – even large datasets with LOTS of smaller files – it was pretty darn fast. But for very large files – i.e. server images – it wasn’t as efficient as it could be.
With a lot of data to cover, it naturally took longer to finish each task than it would for a smaller file, which slowed down the process for a 500GB server image compared to 500GB of smaller files. It was a bit like a relay race. All the individual pieces were fast, but one couldn’t start until the other was done. The whole was only as fast as its slowest part. This also meant that very large files couldn’t take full advantage of Zetta’s WAN-optimized architecture.
Better Backup Through Parallelization
Clearly, we needed to address the big-file roadblock problem, especially as we get more server images stored in the cloud. To do this, we reconfigured the system to break large files into smaller ones, then reassemble them at the destination. This allowed us to parallelize the process for large files: while some chunks of the file are being checked for changes, others are using the CPU for calculations, moving over the network or being written to the Zetta backend, all at once.
This solves the relay problem: all four steps of the process are optimized and running at once, greatly improving the transfer speed. Parallelizing the steps also meant that very large files could now benefit from the full range of our WAN optimization, especially multithreading transport technology.
The results? In our speed test, Zetta DataProtect 4.7 finished an incremental backup for a 490GB server image with a 5% change rate in just 1 hour, 7 minutes. Faster incremental backups means more data can fit in the backup window, and more data in the backup window means a larger total dataset in the cloud. That’s good news for anyone with lots of data – and that’s pretty much everyone these days, isn’t it?