Analysis: Sandisk has published a blog about an interview with high bandwidth memory (HBM) pioneer Professor Joung-ho Kim from KAIST’s Department of Electrical and Electronic Engineering. KAIST is Korea’s Advanced Institute of Science and Technology, a national research university and Joung-ho Kim was instrumental in HBM development. He’s now involved with high bandwidth flash (HBF) technology which Sandisk sees as an answer to the GPU HBM wall problem; AI workload context memory (KV cache) overflowing HBM capacity, causing time-consuming recomputation of vectors.
We wrote about this late last year and noted its development would be complex. Nvidia, the largest HBM buyer, has not yet expressed any public interest in the technology.
Since then, Nvidia has devised its context memory extension technology (ICMSP) which uses DPU-connected NVMe SSDs to hold overflowing key:value cache data from HBM and the GPU server’s DRAM. ICMSP (Inference Context Memory Storage Platform) is effectively higher bandwidth and lower latency flash than a standard SSD as the BlueField-4 DPU to which it’s attached is a storage accelerator, and connected to the GPUs in a Vera Rubin pod by Spectrum-6 Ethernet, using photonics and running at 800 Gbps per port.
With this background let’s look at the Sandisk blog and see what it can tell us about HBF prospects.
Prof. Kim outlines the need for technology like HBF, saying: “In the (AI) transformer model, especially in inference cases, it is memory-bound. So rather than having a lot of computation, they spend more time bringing data from the memory and writing process. Bandwidth is limiting them.”
“Unfortunately, most inference and training processes, as well as performance, are limited by memory. That means we need more memory innovation. But in the memory world, we have SRAM, DRAM, and NAND flash memories. And we have to somehow design those connections.
Kim thinks “The computing innovation will be driven mostly by memory architecture. That I strongly believe.”
He has outlined a model in which you can place 100GB of HBM to act as a cache in front of a 1TB layer of HBF, and notes: “The challenge is that the GPU has to accept that new architecture. That is the best for them. … Also, developers will have to change the software to optimize the software and hardware together. For example, some data has to be directly connected and transmitted from HBF to HBM.So, they need a new instruction set and circuit to support them. They have to accept those kinds of new parameters.”
That makes HBF adoption more complex.
Stepping back, we could say that HBF technology development is a multi-year effort and requires a GPU manufacturer, such as Nvidia, to surround its GPU with a set of HBM dies, and surround them in turn with a set of HBF dies if the HBM caches the HBF, or provide direct GPU-to-HBF connectivity if it doesn’t, making GPU memory management more complex. We’re looking at a lot of semiconductor-level work here.
We note that SK hynix and Nvidia are working together on a 100 million IOPS AI SSD (AIN-P) notion. If that SSD were to be used in Nvidia’s BlueField-4 connected ICMSP, then there might be no need for HBF.
We think that, if there is a general SSD industry HBF standard and if Nvidia adopts HBF as a technology direction, then HBF technology has a future. Absent these two things it will struggle.