Opinion
Architecting Cloud Storage Solutions
posted on 23 August 2008 10:44
The cloud storage concept started as a service offering from Amazon (S3), paralleling its cloud computing offering (EC2). Behind the scenes of Amazon’s S3, it manages multiple commodity hardware devices tied together by software to create a pool of storage. Emerging web companies have embraced this offering, creating industry buzz about the terms and concepts of cloud storage.
By Mike Maxey, director, product management, ParaScale, Inc.
Cloud storage is an architecture, not a service. Whether you own or rent is a secondary concern. Fundamentally, cloud storage is about easily scaling cloud capacity and performance by adding standard hardware, and having shared access via a standard network (public internet or private intranet). Managing hundreds of servers to look like a single, large pool storage device has proven very challenging. Early providers (e.g. Amazon) took this burden and chased revenue via on-line rentals. Others (e.g. Google) hired an army of engineers to run it inside the firewall and customized the storage nodes to run applications on it. With Moore’s Law driving down commodity disk and CPU prices, cloud storage stands to be a highly disruptive technology inside the data center.
Clustered NAS systems have been around for the better part of a decade. This article reviews different architectural approaches to building a cloud or massively scalable NAS system, and is relevant to enterprise IT managers looking to build a private cloud for their consumption, or for service providers looking to build public clouds to offer storage as a service. Architectures fall into two categories with delivery via a service or as software or a hardware appliance.
Traditional systems leverage a Tightly Coupled Symmetric (TCS) architecture well designed to solve the problems of HPC (high-performance computing, supercomputing) and are now being recast for scale-out cloud storage given the fast-emerging market needs. Next generation architectures have embraced a Loosely Coupled Asymmetric (LCA) architecture that centralizes metadata and control operations and are not well-suited for HPC, but designed to address the bulk storage requirements of cloud deployments. A summary of each follows.
Tightly Coupled Symmetric (TCS) architectures:
TCS systems were built to solve the single-file performance challenges that limited traditional NAS. HPC systems quickly overwhelmed the storage, as they demanded single file I/O much greater than that from a single appliance. The industry responded by creating products that leverage the TCS architecture, many nodes acting in parallel with distributed lock management (locking different parts of a file for writing) and cache coherency. The solution is elegant for the single-file throughput problem and many HPC customers in several industries have embraced it. The solutions are sophisticated and require a fair degree of technical sophistication to install and use.
Loosely Coupled Asymmetric (LCA) architectures:
LCA systems take a different approach to scale-out. Instead of implementing a strategy where every node knows everything about every action, LCA leverages a central metadata control server that is out of the data path. Centralization provides many benefits and enables a new level of scalability:
- Storage nodes can focus on serving read and write requests without needing confirmation from peers.
- Nodes can utilize different commodity hardware CPU and storage configurations and still participate in the cloud.
- Users can tune the cloud by leveraging hardware performance or virtual instances.
- Removing the overhead of massive state sharing between nodes also removes the need for custom interconnects like fiber channel or infiniband, further reducing costs.
- Mixing and matching of heterogeneous hardware gives users the ability to expand when necessary based on current economies of scale while providing perpetual data availability.
- Having centralized metadata means that the storage nodes can be spun-down for deep archive applications, and the metadata will always be available on a control node.
Cloud Choices
Within scalable NAS platforms there are many choices, but in general they fall into a service offering, a hardware appliance or a software solution and each has their pluses and minuses:
- Service Model: Service offerings come to mind most commonly when one thinks of cloud storage. It is very easy to get started in this model and scalability is almost instant. And by definition, you have a copy of your data offsite. However, bandwidth is a limitation; so think about your restore model. And you have to be comfortable with your data being outside your network.
- HW Model: The deployment is behind your firewall, and delivering better throughput than the public intranet. Buying integrated bricks is very convenient, with a rack and stack model if the vendor does a good job of the install/manage process. But you are giving up some of the Moore’s law benefits as you are restricted in your hardware choices.
- SW Model: Has the advantages of the HW model. Plus you have the benefits of really competitive HW buys. However, the install/manage process should be looked into carefully since installing some of this SW is really difficult, or else requires resorting to restricted HW choices.
The following table summarizes different vendor options:
|
Vendor |
Tightly Coupled Symmetric |
Loosely Coupled Symmetric | Service Offering | Hardware Appliance | Software Only |
| ParaScale | X | X | |||
| Amazon | X | X | |||
| Nirvanix | X | X | |||
| EMC Hulk/Maui* | ? | ? | X | X | |
| IBM XIV* | ? | ? | X | ||
| NetApp GX | X | X | |||
| Isilon | X | X | |||
| HP ExDS9100 | X | X | |||
| BlueArc | X | X | |||
| Atrato | X | X |
* Unreleased products so details are limited
With the massive digitization of data, in an era where corporations use YouTube to distribute training videos, there is a need to put all this digital “stuff” somewhere. Businesses like those in engaged in content creation and distribution, genomic research, medical imaging, etc. have even more acute requirements. Cloud storage with a LCS architecture is a perfect fit for this kind of workload and provides huge cost, performance, and manageability advantages.
Mike Maxey is director of product management for ParaScale, a Silicon Valley startup focused on addressing the exploding bulk storage requirements for digital content and archival data. He can be reached at 408-861-3679 or mike@parascale.com.
tags: Cloud
in Opinion
Data De-Duplication: Can it Move from Secondary to Primary Storage?
you're reading:
Architecting Cloud Storage Solutions



