Transitions: the fourth storage wave

Accelerated I/O for parallel access

There is a fourth wave of disk array-based storage products coming. They are aimed at applications needing massive amounts of simultaneous I/O from high-capacity arrays, and products are arriving from the video surveillance, video streamig, high-performance computing (HPC), web 2.0 and cloud (storage as a service, part of a service) areas.

We can put together a potted disk array storage history like this:-

1. Traditional SAN/NAS array vendors – EMC, NetApp, HDS – wih monolithic and modular arrays
2. Virtualised, less expensive and more intelligent arrays – 3PAR, Xiotech Magnitude, Pillar Axiom – plus iSCSI arrays – EqualLogic (Dell), LeftHand Networks
3. Virtualised and clustered and higher performance file storage – Isilon, BlueArc, ExaGrid
4. Clustered, high-I/O newcomers – Panasas, Pivot3, Atrato and Xiotech with its coming ISE product.

Possibly IBM’s XIV technology and the forthcoming EMC Hulk/Maui technology fit in this 4th wave category too.

For example, here is an extract from yesterday’s Xyratex earnings call transcipt with CEO Steve Barber speaking: “During the quarter, we also began working with IBM in support of their devotion of the XIV technology in the data storage business. The resulting IBM product is designed specifically to capture and manage unstructured content, including medical images, music, videos, web pages, and other discreet files. In area of data storage, projected to see significant growth. Again, we’re working closely with IBM to support their business plan for this product.”

The characteristics of a 4th wave storage array include:

- Massively scalable capacity through a virtualised storage pool with clustered elements. Atrato has multiple sealed 2.5-inch drive mini-arrays. Pivot3 has clustered building blocks using gigabit Ethernet (RAIGE). Panasas has its ActiveStore. Xiotech, using similar base technology to Atrato, will introduce mini-arrays of sealed multiple disk drive units. These arrays typically are virtualised, stripe data widely, employ RAID schemes with lowered re-build times through spreading out data and parity across multiple spindles.

- Massively scalable performance though having multiple intelligent controllers. The mini-arrays will each have controllers. There will be caching to speed I/O as well as multiple spindles to service the caches.

- Parallel I/O with lots of ports and massive I/O scalablity by adding controllers with multiple ports, either separate from the storage array elements or combind with them.

- Fault-tolerance and self-healing schemes.

- A global namespace for files or an object-based store that could deliver files or blocks.

These are systems for transactions, but not traditional transaction processing with a back-end database fielding financial trading or that sort of thing. A transaction in the 4th wave systems is a web page request, a media stream access, a surveillance camera feed to the store, a file access request to a storage-as-a-service (SAAS) supplier. The data is semi- or unstructured in nature and there is lots of it.

For the storage system servicing these read and write requests it is like sitting under Niagara Falls, with millions of individual requests having to be serviced separately inside an overall tremendous 2-way data flow.

The 4th wave systems are centrally-managed. They don’t archive; that is some other supplier’s responsibility for now. They will replicate. They won’t de-duplicate data because access speed is key. That would likely be done by backend DR or archive stores.

These 4th wave systems may become general in their application scope. Companies using them may migrate their e-mail storage to them for example.

Existing vendors may adopt the technology. We may well see 3PAR, Pillar and others evolving their offerings to satisfy customers who are being targeted by Atrato, Panasas, Pivot3 and Xiotech. At present the four 4th wave companies identified here would all say, I think, that they outperform every other storage vendor in performance and price/performance terms.

Yes, you could get as much if not more performance out of Texas Memory Systems and Solid Data DRAM-basd solid state disks (SSD) but the prices rule them out of contention for video surveillance, media streaming and the generality of web 2.0/cloud service companies.

IBM and EMC appear to have entries in the 4th wave product space. HDS, HP, NetApp and Sun do not – yet. We’ll have to see whether they OEM, resell or invent their own technology. Exciting times.


