Hi — this post completes my blog series outlining the design concepts behind an enterprise-class storage service suitable for hosting not just secondary but even primary enterprise data.
Here is an outline of this series and hyperlinks to previous posts:
- Introduction
- Accessed like traditional storage
- Easy to use enterprise features
- Comprehensive data integrity/protection
- Data security/privacy
- Continuous availability
- Non-blocking performance
- Administrative transparency and control
- Provides good investment value
This last post is more business-focused and less product-focused, and concerns the need for such a service to provide good investment value to customers.
Throughout this series, I have attempted to touch on the various technical concepts and requirements that need to be fulfilled before an IT person will trust his data to a storage service; but if the service isn’t priced to make good business sense, it won’t work for our customers no matter how well it is designed.
When considering “good value” in this context, it is imperative to think holistically about what a storage service provider is providing to its customers, since it is far more than just storage capacity. Using an enterprise storage service should free an IT professional from having to do all of the following:
- Acquire storage capacity
- Configure file system software
- Design, provision and configure the specific solution (all done instantly)
- Obtain data center resources (space, power, cooling, etc)
- Provision networking infrastructure and bandwidth
- Protect the data (i.e. backup)
- Secure the data (i.e. encryption)
- Administer and manage the solution 24×7
- Support storage users, 24×7
The customers we have who have done serious analysis on their storage costs on a per-gigabyte, per-month basis have told us that their costs range from $1.00 to over $3.00 per month per gigabyte. This may not seem intuitive in a world where capital outlay for capacity can be as low as $1.00/GB, but when all the other costs are factored in, storage gets much more expensive to do yourself. Try some comparisons using the Zetta TCO (Total Cost of Ownership) Calculator.
| Traditional Storage Ownership Costs | |
| Tier 1 | $3.50/GB per month |
| Tier 2 | $1.25/GB per month |
| Tier 3 | $1.00/GB per month |
And there can be many other, less-immediately-obvious (but no less tangible) values to using an enterprise storage service provider, including:
- Administrator opportunity cost savings — time is now freed up to work on things that really impact the business, not managing disk rebuilds
- Future proofing — the need to upgrade your infrastructure every 3-5 years has now been offloaded to the service provider — no more data migrations!
- Maintenance costs — no more maintenance contracts to your hardware and software vendors
- No re-coding/re-architecting to use the service — if a service has been built for an enterprise, with standard access protocol support, then you can onload your data — and offload your data — without any changes to your existing infrastructure.
Reflecting back over the entire list of enterprise requirements, when Zetta started this adventure at the end of 2007, it was with the goal of providing enterprise IT professionals with a storage service designed for their primary storage needs. The service we have in market today meets the requirements that IT people told us they would need in such a service. I invite you to explore further and see the benefits that other enterprise IT professionals are already enjoying at www.zetta.net.
Tweet This
January 12, 2010
Hosting Primary, Unstructured Enterprise Data in the Cloud – Part 5: Data Security/Privacy
Chris Schin, VP Products, is responsible for coordinating all Zetta product-related initiatives including product strategy, direction, and marketing, as well as business model and go-to-market process definition. Prior to joining Zetta, Chris was acting GM and Senior Director for Symantec Protection Network, Symantec's Software as a Service platform.
Happy New Year! I’m back with part 5 of a nine-part blog series that describes the requirements for hosting primary unstructured enterprise data in the cloud.
This entire blog series includes an introduction and the following set of requirements:
- Introduction
- Accessed like traditional storage
- Easy to use, enterprise features
- Comprehensive data integrity/protection
- Data security/privacy
- Continuous availability
- Non-blocking performance
- Administrative transparency and control
- Good investment value
When talking to enterprise IT professionals (our customers), the second-most frequently-referenced concern/consideration (second only to “don’t lose or corrupt my data,” which was covered in my last post) is “don’t let anyone else see or steal my data.”
As this first post of the year comes right after Network World has named Zetta as one of the ‘10 Storage Startups to Watch,’ I would like to say that it is certainly rewarding to see editors such as Jon Brodkin recognize that while “many companies are concerned about the safety of trusting their information to a third party, to help ease those concerns, Zetta has built a system that encrypts data at rest, and can withstand multiple hardware and network failures without losing data.” There are certain baseline security/privacy criteria that must be met prior to trusting a cloud storage solution with primary copies of enterprise data.
- Wireline encryption: Using a storage service (as opposed to an inside-the-firewall solution) clearly implies a need to secure the data in transit from the enterprise to the service. Fortunately, this is increasingly facilitated by the protocols themselves. Most file transfer protocols and Web-optimized storage protocols have encrypted versions readily available today, including sFTP, FTPS, and Secure WebDAV, run over HTTPS. Even traditional storage access protocols are building in wireline encryption in recognition of our increasingly Internet-driven existence, such as NFSv4.
While we encourage customers to use these encrypted protocols, there are clearly use cases that require the use of unencrypted protocols. The solutions here are also tried and true — either encrypt prior to sending the data, contract for a dedicated network link, or work with the service provider to put in place a secure tunnel, such as a VPN.
- Logical partitioning within multi-tenancy: By some definitions (certainly mine), a service must be multi-tenant before it can be considered a “cloud” service[i]. In order for enterprise IT professionals to have confidence using a cloud storage service for enterprise data, they must know that their data cannot be accessed while resident in service infrastructure. The first step to this is to ensure logical separation between customers at the “front door” of the service infrastructure — the initial customer access point to the service. Virtualization makes this easy — simply house every customer’s mountpoint as a unique URI within a distinct virtual machine instance. This way, you know that your access point is completely unique to you, and is not a shared resource comingled with other users.
- At-Rest Encryption: By far the most significant feature to ensuring data security is default encryption at rest, supplied by your service provider at no additional cost. Ideally this should be facilitated by a full Public Key Infrastructure (PKI) backed by FIPS 140-2 compliant key repositories, with advanced bit encryption, a robust key rotation scheme, and ideally per-customer or per-volume keys. Strong encryption at rest is really table stakes for any enterprise-class data storage service.
To reiterate a common theme across these posts — it is important to remember that these are the baseline requirements that your cloud storage provider should take in consideration from the development phase. These types of customer requirements drove the design of the Zetta storage solution, which was built specifically to house enterprise primary data in the cloud.
I’ll be back in a few days to touch on the next requirement, continuous availability architecture.
[i] Note that this is not a statement unique to storage services, but to any kind of service.
Tweet This
December 16, 2009
Hosting Primary, Unstructured Enterprise Data in the Cloud – Part 4: Comprehensive data integrity/protection
Chris Schin, VP Products, is responsible for coordinating all Zetta product-related initiatives including product strategy, direction, and marketing, as well as business model and go-to-market process definition. Prior to joining Zetta, Chris was acting GM and Senior Director for Symantec Protection Network, Symantec's Software as a Service platform.
Hi – for those of you just tuning in, this is part four of a 9-part blog series in which I am describing the Zetta solution, and how it is built from the ground up to host primary, unstructured enterprise data in the Cloud. Here again is the list of requirements; the first two have already been addressed, along with an introduction to the series:
- Accessed like traditional storage
- Easy to use, enterprise features
- Comprehensive data integrity/protection
- Data security/privacy
- Continuous availability
- Non-blocking performance
- Administrative transparency and control
- Good investment value
In this post I’m going to discuss the position that in order for a cloud storage service to be a viable option to host primary enterprise data sets, it must provide comprehensive data integrity/protection.
At Zetta, our most important design consideration was data integrity (or data “protection” – whatever term is used, this is the idea that we won’t allow data to be lost or corrupted), since ultimately a data storage solution is worthless if it allows the loss or corruption of data.
A dirty little secret in the storage community is that data corruption happens all the time – though the relative rate of corruption seems low on its face, the increasing scale of data stored guarantees that corruption events are always occurring. For more on this topic at a deeper technical level, along with a calculator to help you gauge your own data integrity risk, please see JW’s post on Calculating Mean Time To Data Loss (and probability of silent data corruption).
So any solution for storing primary enterprise data MUST assume data corruption will happen, and must be designed to adapt to that reality and repair corrupted data, thereby guaranteeing data integrity.
Here are some of the unique ways the Zetta solution has been designed to automatically detect data corruption and repair it; taken collectively, these tools truly give Zetta an unparalleled data integrity profile.

- Write Receipts — Zetta creates a strong SHA-1 hash of every file that enters a Zetta customer virtual volume, and we do two things with that hash (one of which is optionally available at the customer’s request, one mandatory though transparent to the customer).
- First, at a customer’s request, we can place these hashes on the customer’s volume, allowing a customer to ensure that what we have stored at Zetta is what was sent by the customer.
- Second, we store each hash in perpetuity. This allows us to compare a read file with the one that was originally received; if there is any difference, we repair the file before completing the read, guaranteeing that what is read is identical to what was written.
- RAIN6 N+3 — Zetta employs a best-in-class RAID algorithm. It is based on RAID 6 (based on Reed Solomon encoding), and adds an additional parity node (RAID 6 traditionally has 2 parity nodes, the Zetta solution has 3 – this is laid out in great detail by JW in his post on Data Integrity in the Cloud). We also refer to it as “RAIN” because we stripe data not just across independent disks, but actually across independent nodes (i.e. storage servers). This level of redundant protection is not available even in traditional storage hardware from top vendors, ensuring integrity (and availability) of data in the event of up to three independent computer failures.
- Proactive Error Correction — In addition to creating a SHA hash of every complete file that enters the Zetta storage cloud, the Zetta solution also creates a SHA hash of every “chunk” of data encoded and striped across the disks in our lower-level storage servers. Then, using any spare system processing cycles, a background process on the system traverses all hard drives and compares those stored hashes to the current chunk on disk, proactively detecting and repairing any data corruptions on disk using our triple redundancy RAIN6 encoding.
- Snapshots — Zetta cloud storage comes with a full-featured file system (a distributed, clustered, highly parallelized file system that we’ll be discussing in a future post). As with most file systems, the Zetta file system provides full snapshotting capabilities – either scheduled or ad hoc snapshots. And Zetta snapshots are free from the capacity and performance limitations of single devices and fixed size clusters. This provides a customer-controlled protection mechanism – once a snapshot is created, the file system is preserved in that state until the snapshot is deleted, allowing a user to go back and restore filed and directories from the “.snapshot” directory like with any on-premises filer.
- Geo-Replication — All customer data stored at Zetta is replicated to another data center. In 2010 we expect to begin to offer full asynchronous replication to our customers who want a fully-mountable volume resident in another Zetta data center, either for performance or for data integrity.
Again, the Zetta solution was designed with the core premise that preventing data loss was our primary charter, and these are some of the unique features we’ve put into the solution to live up to that charter.
Compare this with what is available today from the HTTP cloud storage vendors. I reiterate that this is not a knock on those solutions – they do an excellent job for their customer target, but their target is not the enterprise, and they don’t provide the requisite features to host primary enterprise data. These solutions do not provide write receipts, have no RAID implementations, lack proactive error correction, and offer no file systems with snapshots.
I’ll be back soon to discuss Zetta’s approach to data security & privacy.
Tweet This
December 10, 2009
Hosting Primary, Unstructured Enterprise Data in the Cloud – Part 3: Easy to use, enterprise features
Chris Schin, VP Products, is responsible for coordinating all Zetta product-related initiatives including product strategy, direction, and marketing, as well as business model and go-to-market process definition. Prior to joining Zetta, Chris was acting GM and Senior Director for Symantec Protection Network, Symantec's Software as a Service platform.
Hi – back again to discuss the requirements for hosting primary, unstructured enterprise data in the Cloud. For your convenience, I have reprinted the list of requirements from my initial post again – note that I will continue to do this and will link from this list to previous posts as well:
- Accessed like traditional storage
- Easy to use, enterprise features
- Comprehensive data integrity/protection
- Data security/privacy
- Continuous availability
- Non-blocking performance
- Administrative transparency and control
- Good investment value
In this post I’m going to discuss the position that in order for a cloud storage service to be a viable option to host primary enterprise data sets, it must provide easy-to-use, enterprise features.
On the face of things, this seems obvious, almost tautological: of course a solution for storing enterprise data should provide enterprise features; however, existing HTTP-centric cloud storage solutions lack key features that should be required for enterprise adoption.
So what are some examples of these “enterprise features?” Here is a list of things our enterprise users told us to include in the Zetta Cloud Storage service — things that they expect to find in a cloud storage solution hosting their primary data:
- Parity to Traditional Arrays — in essence, the cloud storage solution should come with the features you’ve come to expect from any robust NAS array you have running as a bump on your network today – things like snapshots, mount-and-write, integration with external systems (e.g. LDAP), support for existing ACLs, etc. Without these, something will need to be written to replace these features, and enterprises don’t have the resources on hand to simply replicate features they typically get from their storage solutions.
- File-based geo-replication — replication is something that IT administrators have come to expect to be facilitated by their storage technologies. And I’m not talking about the type of replication common among the HTTP cloud object stores – those services typically rely on replication as their sole form of data protection, and employ a solution that is opaque to the user. What our enterprise customers asked us for was a form of replication that results in a mountable, readable volume in another identified data center, with all of the visibility and transparency they would get if they constructed their own solution.
- Capacity management & visibility — An enterprise solution should provide real-time presentation of exactly what is happening on your volumes – usage trends, system performance, and real-time access to events (including bad ones!). The fact that the volumes are resident at a service provider shouldn’t change the fact that you want transparent visibility into what is happening with YOUR data!
- Instant provisioning — In this particular case, you should actually expect your cloud service provider to provide much better performance than you would find with a traditional array – with a traditional array, you need time to take down space and power, negotiate with the array vender or VAR on upfront capital cost, and install, configure and test the array. This can take weeks or months. With a cloud service provider like Zetta, you can be up and running within minutes or hours.
- Native support for file-based apps — this is kind of a short restatement of my last post – an enterprise service should provide a full-featured file system that walks and talks like any other filer on your network, making it plug-and-play with existing enterprise architectures and file-based applications.
There is more that I could touch on here, but this should give you an idea of some of the things Zetta provides to our enterprise customers.
Once again, contrast this set of features with your typical HTTP-centric object stores – by design, those solutions do not provide any of the enterprise features I’ve discussed above, since by design those were built to meet a simpler set of requirements, targeted at a non-enterprise customer.
I’ll be back soon to discuss some of the core strengths of the Zetta solution, beginning with a discussion of the Zetta data integrity solution.
Tweet This
November 23, 2009
Hosting Primary, Unstructured Enterprise Data in the Cloud – Part 2: Accessed like traditional storage
Chris Schin, VP Products, is responsible for coordinating all Zetta product-related initiatives including product strategy, direction, and marketing, as well as business model and go-to-market process definition. Prior to joining Zetta, Chris was acting GM and Senior Director for Symantec Protection Network, Symantec's Software as a Service platform.
Hi – it’s Chris again, and this is the second in my blog series discussing the requirements for hosting primary, unstructured enterprise data in the Cloud. Recall this list of requirements from my initial post:
- Accessed like traditional storage
- Easy to use, enterprise features
- Comprehensive data integrity/protection
- Data security/privacy
- Continuous availability
- Non-blocking performance
- Administrative transparency and control
- Good investment value
In this post, I am advocating the position that in order for a cloud storage service to be a viable option to host primary enterprise data sets, it must be accessed like traditional storage.
Why? Simply put, the design of the cloud storage service must be one that imposes no limits on how the storage can be adopted into the enterprise environment. As such, the solution should appear just like any other traditional NAS filer on your network. If the cloud storage service walks and talks exactly like the NAS filers you are already using (and you probably have at least several on your network today – possibly numbering into the hundreds or thousands), then you can instantly extend your existing environment into the cloud without any modifications to your existing enterprise application infrastructure.
So what do I mean when I say “walks and talks exactly like the NAS filers you are already using?” This list gives some tangible examples; in order to be viable as a repository for enterprise unstructured data, the cloud storage service should be:
- Mounted like any existing network share — mountable as a Unix or Windows network share exactly as if it were on your network inside your firewall.
- Accessed via a full-featured, distributed file system — accessed over existing paths, directories, permissions, and commands, and seamlessly leveraging external system integrations (e.g. LDAP), while delivering all the capabilities of a traditional session (e.g. ACLs and strong consistency). In order to provide this, the cloud storage service must be accessed via a complete, full-featured file system.
- Served up over traditional protocols across all operating systems — accessed over the protocols that your applications and operating systems use today (and have for many years) – mount the storage as a Windows share over CIFS, as a Unix share over NFS, access it as an FTP server for large file transfers, access it in an HTTP-optimized way over WebDAV, etc.
- POSIX compatible — this is perhaps a bit more esoteric than the first three, but it is no less important; in order to be viable as a repository for enterprise unstructured data, the cloud storage service should be compatible with the POSIX command set. POSIX has been around and in use in enterprises for 20+ years, and virtually all enterprise applications are written with the expectation that the POSIX command set will be available. One key piece of this is strong consistency – POSIX compatibility ensures that any read from the data set will yield the data from the most recent write. Without this, the applications must be modified to manage cases where a read is yielding out-of-date data.
There is more to this concept, but this should give you the idea of what I mean. It also follows that since Zetta was designed and built for the express purpose of hosting primary, unstructured, enterprise data in the cloud, a Zetta storage solution fulfills these requirements.
Contrast this with the current generation of HTTP-centric object stores[i]:
- HTTP object stores are not “mounted” like an existing network share, with a robust set of commands available, they are accessed using GET/PUT/POST/DELETE Web APIs, and your enterprise applications would need an overhaul to support that change.
- HTTP object stores lack the file system semantics your applications expect, instead using a simplified “bucket”/“object” storage paradigm.
- HTTP object stores leverage (often proprietary) Web APIs (either REST or SOAP), not the NFS/CIFS/FTP protocols your operating systems are made to use.
- HTTP object stores leverage an “eventual consistency” model that violates POSIX and does not ensure that you applications will read back the most-recent write. Again, your applications would require costly modification to adapt to this deficiency.
More to come next week, when I tackle the requirement that an enterprise-class cloud service should provide easy to use, enterprise features.
[i] I want to reiterate that I am not bad-mouthing the HTTP object store solutions – those solutions serve the needs they were designed to serve very well (HTTP-centric use cases; e.g. Web applications), but were not built to serve the enterprise and therefore lack multiple key features that you (an enterprise IT professional) need when you are looking to extend your existing infrastructure to the cloud without any rewrites to your existing application footprint.
Tweet This