Flash

High Bandwidth Flash is years away despite its promise

Published

Analysis. High Bandwidth Flash (HBF) promises extraordinary capacity but faces extraordinary complexity.

High Bandwidth Flash (HBF) will stack multiple layers of NAND dies – each themselves composed of hundreds of stacked 3D NAND cell layers – creating unprecedented memory capacity alongside daunting engineering challenges.

Professor Jung-Ho Kim of KAIST’s Department of Electrical and Electronic Engineering, explained HBF’s development as a complement to GPU High Bandwidth Memory (HBM) in the Korean media outlet EEWorld.

HBM is expensive. It consists of stacked layers of planar (2D) DRAM connected to a base logic by channels called Through Silicon Vias (TSVs), as the below diagram illustrates:

The DRAM stack and logic die sit on an interposer, a semiconductor device to link them to a processor, the GPU in this case. The HBM advantage is that it provides far more processor-to-memory bandwidth than an x86 CPUs memory sockets scheme. GPUs have hundreds or thousands of cores, compared to the tens in a modern x86 CPU, and they each need to access memory.

Current HBM3E has 8 to 16 layers; SK Hynix’ 16-Hi device delivers 48 GB capacity. HBM4 could have similar capacity but double the bandwidth at 2 TBps instead of 1 TBps. The HBM5 generation will require more than 4,000 TSVs through the DRAM stack layers.

Kim presented a view of HBM 6 to 8 generations in a KBS YouTube video. Each HBM generational advance will require more complexity at the memory stack logic die and interposer levels:

Bear this in mind as we move on to HBF. The HBF idea is to provide more memory for the GPUs by using flash – which is cheaper to make, albeit slower to access than DRAM. HBF would stack layers of NAND dies, each connected to base logic dies, and then routed to the GPU via an interposer again. Conceptually we can envisage this kind of setup:

Currently NAND, in its 3D NAND incarnation, is itself composed of stacked layers:

Diagram based onResearch Gate source

SK hynix is shipping 238-layer product in a 512 Gb (64 GB) die using TLC flash, and has 321-layer technology coming. The stack of memory cells sits above a peripheral logic layer base.

Consider a 12-Hi HBF stack: 12 3D NAND layers totalling 2,866 layers (using 238-layer NAND) with 768 GB of capacity. A 16-Hi stack of 321-layer 3D NAND would have 5,136 layers overall, and likely surpass one terabyte of capacity.

The interconnection plumbing here will be extraordinarily complex. The Sk hynix 512 Gb die is a single stack, and each of the NAND strings above the base layer has its etched vertical channel connecting it to the base logic die.

Imagine having two of these dies layered one above the other. The upper die’s base peripheral logic layer has to be linked to the interposer at the bottom. Do these links pass through the first NAND die, or go round it? Either way would increase the 2D dimensions of the overall device. The interposer also now has to carry the signals from two 3D NAND dies to the GPU, increasing its complexity.

Let’s make the problem worse and envisage a 12-Hi HBF stack, meaning 12 stacks of 3D-NAND, each needing to connect to the interposer, making the device size larger still, and the interposer becoming more complex again.

GPU-to-HBM-and-HBF connectivity requires sophisticated coordination. Nvidia, as the dominant GPU manufacturer, would need deep involvement. A standard is essential so multiple suppliers can compete and prevent monopoly pricing.

This explains why Sandisk and SK Hynix are active in HBF standardization, and suggests HBF remains two or more years from commercial availability.