Hello again and welcome back to my blog series outlining what our customers told us they wanted to see in a cloud storage solution before they would put primary copies of their enterprise data in the cloud. Again, it is important to note that these requirements drove the design and development of the solution we have in market today.
This is the outline of the series and hyperlinks to previous posts:
• Accessed like traditional storage
• Easy to use enterprise features
• Comprehensive data integrity/protection
• Data security/privacy
• Continuous availability
• Non-blocking performance
• Administrative transparency and control
• Provides good investment value
This post discusses how a service provider must create a storage solution architecture that can ensure “non-blocking” performance, enabling it to adapt to multiple customer access patterns simultaneously.
There is no question that innovations have allowed today’s traditional arrays to scale to huge capacity — hundreds of terabytes per array. But the core array architecture has changed little across time, and this architecture can limit the amount of additional capacity that can be added, and can even prevent existing capacity from being utilized adequately. A massive scale, multi-tenant architecture requires a fundamentally different design — one that borrows heavily from distributed systems design principles.
There are effectively three components to any storage solution: the network, the controller, and disk. In a traditional array, purchase-time decisions are made that determine the ratios of each of these to the others, and those decisions are very difficult to alter once the array has been deployed. Unfortunately, circumstances change, and one of these three components almost always becomes the bottleneck, preventing full utilization of the other components. For example, if the workload winds up being more controller-intensive than expected, the disks won’t ever be filled.
A service provider who tries to construct a storage service using a series of high-priced, traditional arrays will fall prey to this dynamic in a very acute way — installing multiple arrays doesn’t obviate this issue, it expands it. This is augmented by the fact that there is literally no way to plan in advance for customer behavior when the customer isn’t even identified prior to array purchase, as is the case for a cloud storage service provider.
A cloud service provider shouldn’t attempt to use traditional vendor-produced arrays to create a storage service — the costs don’t add up, any single customer’s access pattern could negatively impact others, and the fundamental array architecture is in conflict with the notion of a storage service.
Instead, a storage service must be architected using Internet-centric distributed computing principles. Each of the tiers of the architecture — throughput, IOPs, and density — should be able to scale independently of any other tier, allowing the service provider to adapt to customer behavior — singly and in aggregate — as necessary to ensure adequate performance to all and adequate system resource utilization in the aggregate.
One additional best practice to mention: unlike computer processors, disks are mechanical devices — they spin at a certain maximum rate. As a result, if enough IOPs hit a disk at the same time, the disk can become snarled and disk throughput can fall off a cliff. Since both IOPs and density are determined by the disk, a storage service should provide a QOS engine — similar to a computer’s scheduler — to ensure that disks never reach a point-of-no-return under load, where IOPs begin to slow exponentially.