three blocks
Datacore Software

Analysis

IBM's XIV - Release 1

posted on 02 August 2008 15:26


Introduction to basic architecture and components

Here is a picture of IBM's exciting new XIV Storage System gleaned from talking to several sources. It is in three parts.

Part the first, this one, is an introduction to the system, the current Release 1 product and its components. 

Part the second discusses the way storage capacity in the XIV is organised, presented, managed and used.

Part the third discusses the coming Release 2 version of the product, due in the third quarter of this year. It also discusses the XIV product's positioning and its competition - which will surprise you.

Note. This information is indirect, being both second and third-hand. Verification is being sought from IBM but in its absence this information must be regarded as speculative.

Read on for an introduction to the revolutionary XIV product.

- - - -

 IBM XIV Storage System

The overall XIV message is tier 1 storage at tier 2 prices but with tier 1 performance and reliability in a unique scale out architecture. It's loosely SATA storage with Fibre Channel SAN performance and reliability.

The scale out architecture refers to the XIV system's multiple dimensions of scaling:-

- Disk capacity by adding data modules
- Cache size
- Cache-to-disk and cache-to-host bandwidth
- CPU power by adding interface modules
- CPU power used to manage the cache
- Snapshot capabilities and performance

Data written to an XIV is striped across all of its drives so that lots of spindles provide high performance levels and also high utilization without localised hot-spots. All data written to an XIV is also internally mirrored and can be copied to a remote XIV.

All storage in the XIV is virtualized into a single storage pool. Most components are redundant to provide 24x7x365 availability.

The XIV Storage System provides concurrent Fibre Channel and iSCSI block-level access and not file-level access. It is not a network-attached storage (NAS) system.

Interestingly the XIV architecture has similarities to HP's ExDS9100 which also has a 3-component layout of storage modules, (unindentified) internal switch, and processing modules.

The XIV has a global hot spare disk capacity concept but instead of specific hot spare drives the spare capacity is spread throughout the system. In effect some of the raw capacity of each drive is set aside for spare capacity to be used if a drive in the system fails. Thus the raw capacity of the XIV is reduced by a certain percentage for the sparing and the remainder halved for the mirroring.

A 120TB raw capacity system has 51TB usable, meaning 102TB for data and 18TB for hot sparing, 15 percent of the raw capacity.

Release 1 XIV Storage System

The current or Release 1 XIV Storage System is a 42U rack storage array based upon three commodity hardware components: Interface Modules (IMs); Data Modules (DMs); and internal Ethernet switches connecting any IM to any DM. The IMs offer external host connectivity and also manage the whole system.

In more detail:-

- Three 2U Interface Modules offering 4 x 2Gbit/s FC and 2 iSCSI ports to the outside world. Of the 12 FC ports six are for host use and six for remote copy. The remote mirroring ones can be reconfigured for host use. Both FC and iSCSI protocols are supported at the same time

- 2 x 2U 24 port Gigabit Ethernet switches, literally at the top of the rack, for internal connectivity linking each of the three IMs to...

- Eight 3U Data Modules each containing 15 1TB Hitachi SATA disk drives and 4GB of memory. Each DM is like a self-contained internal storage array and does its own caching. There is 32GB of cache overall.

- Eight racks (frames) can architecturally exist in a system with a frame being a 42 rack

- 3 redundant APC UPS modules at the base of the rack provide protection against power interruptions. Two are needed to maintain data operations.

A single 42U rack - frame in IBM-speak - contains 120TB raw storage and 51TB usable storage at 85 percent utilization.

Each IM and DM has an embedded server featuring a CPU, DRAM and PCI-X buses. It runs a Linux-type operating system. The IMs have local hard drives whereas the DMs do not and boot remotely from the IMs.

There are two internal GigE switches for redundancy. Each GigE switch is connected to a different UPS.

There is also a remote management capability.

There is no detail available on how multiple XIV frames interact.

The system runs v9.0 of the XIV software. The three IMs are the overall controllers of the system.

IBM has a fairly restricted manufacturing capablity for the R1 XIV system as it is a quite labour-intensive process.

Interface Modules

Each IM has:-

- 2 dual-port 2Gbit/s FC adapters
- 2 dual-port GbE adapters
- 2 GbE ports on the system board
- Redundant power and cooling with 2 power supplies and 2 fans per power supply.

Active-active multi-pathing (load-balancing) is provided by the host as there are no multi-path I/O drives in the XIV software. If the host doesn't providing multi-path I/O support then there is't any.

Each IM works independently and is functionally similar to a router directing data traffic through the system. The IM has a distribution map of where all the data in the XIV system is stored. It will direct I/O to the appropriate DM based on this map. All of the IMs have this map via a distribution table. The distribution table is looked after by Manager software and it is backed up on the IM's own hard drives.

One IM is the manager with the same Manager software running as a background app in the others. If the primary IM fails then the next one picks up the management role.

Each IM is a field-replaceable unit (FRU).

Data Module

A DM has 15 1TB SATA drives connected by dual 8GB/sec PCI-X buses to its CPU and cache (DRAM). The DM has dual 1GbE ports linking it to each internal XIV switch. The cache is local to each DM and used to stage read and write operations. This is described as a scale out distributed cache architecture.

It is contrasted by IBM with the central shared cache that high end storage arrays from EMC and HDS use with their 16 - 6KB cache segment sizes.

When a DM receives a write request from an IM it will mirror the write to another DM. There is no detail on how the second DM is selected. When an IM receives a read request it sends it to a designated primary DM for that data.

Each DM is a FRU.

Mirroring and RAID

DM mirroring is not implemented in a RAID 1 way and IBM doesn't call it RAID 1 mirroring although the effect of having two copies of data is the same. RAID levels are drive level concepts and no drive-level control exists outside the XIV data modules which are autonomous.

There are no RAID groups. All disks protect all other disks. About 1 percent of each disk is mirrored on each of the other disks in a 120-disk XIV configuration.

When a drive fails the contents will be rebuilt, using the hot spare capacity and mirrored partitions, in 15 -30 minutes for a failed 500GB drive. The failed drive's capacity is redistributed across the other disks and all remaining 119 drives in an original 120-drive system are involved in the rebuild.

Management

There is an out-of-band management capability with the software running on all three IMs, as above. The main IM logs all management data and activity to the others so another one can take over if it fails. The management software is responsible for volume management, point-in-time copy and drive failure rebuild activities. Sysadms interact with the management software from their desktop PC using an Ethernet link and a GUI. A CLI is also provided.

The XIV Manager software is not integrated with IBM's TotalStorage Productivity Suite (TPC) management facility.

- - - -

Click here for a discussion of the way the XIV system presents, manages and uses its storage capacity.

Click here for a discussion of the coming Release 2 version of the product, overall XIV positioning and its competition.

[Chris Mellor.]

 


tags:  ExDS9100 XIV SAN FC iSCSI