Zetta Scalabytes Blog

In this blog, hear from Zetta’s founders and leaders about cloud computing, storage and data management best practices and Zetta Enterprise Cloud Storage technology.

Posts Tagged ‘Snapshots’

Chris Schin

December 16, 2009

Hosting Primary, Unstructured Enterprise Data in the Cloud – Part 4: Comprehensive data integrity/protection

Chris Schin, VP Products, is responsible for coordinating all Zetta product-related initiatives including product strategy, direction, and marketing, as well as business model and go-to-market process definition. Prior to joining Zetta, Chris was acting GM and Senior Director for Symantec Protection Network, Symantec's Software as a Service platform.

Hi – for those of you just tuning in, this is part four of a 9-part blog series in which I am describing the Zetta solution, and how it is built from the ground up to host primary, unstructured enterprise data in the Cloud. Here again is the list of requirements; the first two have already been addressed, along with an introduction to the series:

 

 

In this post I’m going to discuss the position that in order for a cloud storage service to be a viable option to host primary enterprise data sets, it must provide comprehensive data integrity/protection.

 

At Zetta, our most important design consideration was data integrity (or data “protection” – whatever term is used, this is the idea that we won’t allow data to be lost or corrupted), since ultimately a data storage solution is worthless if it allows the loss or corruption of data.

 

A dirty little secret in the storage community is that data corruption happens all the time – though the relative rate of corruption seems low on its face, the increasing scale of data stored guarantees that corruption events are always occurring. For more on this topic at a deeper technical level, along with a calculator to help you gauge your own data integrity risk, please see JW’s post on Calculating Mean Time To Data Loss (and probability of silent data corruption).

 

So any solution for storing primary enterprise data MUST assume data corruption will happen, and must be designed to adapt to that reality and repair corrupted data, thereby guaranteeing data integrity.

 

Here are some of the unique ways the Zetta solution has been designed to automatically detect data corruption and repair it; taken collectively, these tools truly give Zetta an unparalleled data integrity profile.

 

Zetta Comprehensice Data Integrity/Protection Requirements

  • Write Receipts — Zetta creates a strong SHA-1 hash of every file that enters a Zetta customer virtual volume, and we do two things with that hash (one of which is optionally available at the customer’s request, one mandatory though transparent to the customer).
    • First, at a customer’s request, we can place these hashes on the customer’s volume, allowing a customer to ensure that what we have stored at Zetta is what was sent by the customer.
    • Second, we store each hash in perpetuity. This allows us to compare a read file with the one that was originally received; if there is any difference, we repair the file before completing the read, guaranteeing that what is read is identical to what was written.

       

  • RAIN6 N+3 — Zetta employs a best-in-class RAID algorithm. It is based on RAID 6 (based on Reed Solomon encoding), and adds an additional parity node (RAID 6 traditionally has 2 parity nodes, the Zetta solution has 3 – this is laid out in great detail by JW in his post on Data Integrity in the Cloud). We also refer to it as “RAIN” because we stripe data not just across independent disks, but actually across independent nodes (i.e. storage servers). This level of redundant protection is not available even in traditional storage hardware from top vendors, ensuring integrity (and availability) of data in the event of up to three independent computer failures.

     

  • Proactive Error Correction — In addition to creating a SHA hash of every complete file that enters the Zetta storage cloud, the Zetta solution also creates a SHA hash of every “chunk” of data encoded and striped across the disks in our lower-level storage servers. Then, using any spare system processing cycles, a background process on the system traverses all hard drives and compares those stored hashes to the current chunk on disk, proactively detecting and repairing any data corruptions on disk using our triple redundancy RAIN6 encoding.

     

  • Snapshots — Zetta cloud storage comes with a full-featured file system (a distributed, clustered, highly parallelized file system that we’ll be discussing in a future post). As with most file systems, the Zetta file system provides full snapshotting capabilities – either scheduled or ad hoc snapshots. And Zetta snapshots are free from the capacity and performance limitations of single devices and fixed size clusters. This provides a customer-controlled protection mechanism – once a snapshot is created, the file system is preserved in that state until the snapshot is deleted, allowing a user to go back and restore filed and directories from the “.snapshot” directory like with any on-premises filer.

     

  • Geo-Replication — All customer data stored at Zetta is replicated to another data center. In 2010 we expect to begin to offer full asynchronous replication to our customers who want a fully-mountable volume resident in another Zetta data center, either for performance or for data integrity.

 

Again, the Zetta solution was designed with the core premise that preventing data loss was our primary charter, and these are some of the unique features we’ve put into the solution to live up to that charter.

 

Compare this with what is available today from the HTTP cloud storage vendors. I reiterate that this is not a knock on those solutions – they do an excellent job for their customer target, but their target is not the enterprise, and they don’t provide the requisite features to host primary enterprise data. These solutions do not provide write receipts, have no RAID implementations, lack proactive error correction, and offer no file systems with snapshots.

 

I’ll be back soon to discuss Zetta’s approach to data security & privacy.

Twitter iconReading: Hosting Primary, Unstructured Enterprise Data in the Cloud – Part 4: Comprehensive data integrity/protectionTweet This
Chris Schin

December 10, 2009

Hosting Primary, Unstructured Enterprise Data in the Cloud – Part 3: Easy to use, enterprise features

Chris Schin, VP Products, is responsible for coordinating all Zetta product-related initiatives including product strategy, direction, and marketing, as well as business model and go-to-market process definition. Prior to joining Zetta, Chris was acting GM and Senior Director for Symantec Protection Network, Symantec's Software as a Service platform.

 

Hi – back again to discuss the requirements for hosting primary, unstructured enterprise data in the Cloud. For your convenience, I have reprinted the list of requirements from my initial post again – note that I will continue to do this and will link from this list to previous posts as well:

 

 

In this post I’m going to discuss the position that in order for a cloud storage service to be a viable option to host primary enterprise data sets, it must provide easy-to-use, enterprise features.

 

On the face of things, this seems obvious, almost tautological: of course a solution for storing enterprise data should provide enterprise features; however, existing HTTP-centric cloud storage solutions lack key features that should be required for enterprise adoption.

 

So what are some examples of these “enterprise features?” Here is a list of things our enterprise users told us to include in the Zetta Cloud Storage service — things that they expect to find in a cloud storage solution hosting their primary data:

 

  • Parity to Traditional Arrays — in essence, the cloud storage solution should come with the features you’ve come to expect from any robust NAS array you have running as a bump on your network today – things like snapshots, mount-and-write, integration with external systems (e.g. LDAP), support for existing ACLs, etc. Without these, something will need to be written to replace these features, and enterprises don’t have the resources on hand to simply replicate features they typically get from their storage solutions.

     

  • File-based geo-replication — replication is something that IT administrators have come to expect to be facilitated by their storage technologies. And I’m not talking about the type of replication common among the HTTP cloud object stores – those services typically rely on replication as their sole form of data protection, and employ a solution that is opaque to the user. What our enterprise customers asked us for was a form of replication that results in a mountable, readable volume in another identified data center, with all of the visibility and transparency they would get if they constructed their own solution.

     

  • Capacity management & visibility — An enterprise solution should provide real-time presentation of exactly what is happening on your volumes – usage trends, system performance, and real-time access to events (including bad ones!). The fact that the volumes are resident at a service provider shouldn’t change the fact that you want transparent visibility into what is happening with YOUR data!

     

  • Instant provisioning — In this particular case, you should actually expect your cloud service provider to provide much better performance than you would find with a traditional array – with a traditional array, you need time to take down space and power, negotiate with the array vender or VAR on upfront capital cost, and install, configure and test the array. This can take weeks or months. With a cloud service provider like Zetta, you can be up and running within minutes or hours.

     

  • Native support for file-based apps — this is kind of a short restatement of my last post – an enterprise service should provide a full-featured file system that walks and talks like any other filer on your network, making it plug-and-play with existing enterprise architectures and file-based applications.

 

There is more that I could touch on here, but this should give you an idea of some of the things Zetta provides to our enterprise customers.

 

Once again, contrast this set of features with your typical HTTP-centric object stores – by design, those solutions do not provide any of the enterprise features I’ve discussed above, since by design those were built to meet a simpler set of requirements, targeted at a non-enterprise customer.

 

I’ll be back soon to discuss some of the core strengths of the Zetta solution, beginning with a discussion of the Zetta data integrity solution.

Twitter iconReading: Hosting Primary, Unstructured Enterprise Data in the Cloud – Part 3: Easy to use, enterprise featuresTweet This