BAM

Published

BAM – Big Accelerator Memory. This is a GPU-initiated storage data to GPU transfer scheme in which data and control plane traffic flow directly to GPU over the PCIe switch complex bypassing a GPU server’s x86 CPU and its memory. The concept is also known as BaM – Block-accessed Memory – and that is discussed here.

In the current approach, the CPU has the entire control path. The GPU is used as an assistant. But the majority of the processing work happens in the GPU and it’s inefficient to load PB of data via tiling.

What applications need is a GPU view-based access with a notion of an infinite memory pool with:

  • Ability to fetch only needed data on-demand during computation
  • Ability to scale compute and data pipelines independently

So move the majority of control to GPU while using the CPU for house keeping.

Nvidia graphic.

The thinking is that a GPU can drive high levels of IO traffic. The maximum IOPs on 1 x86 CPU core is ~1 million. So 100 million IOPs needs 100 cores, just for IO. AI accelerators (GPUs) have tens of thousands of cores and can use them for massively parallel IO. Therefore faster SSDs are needed to keep up with AI workload demands, and faster data transmission from them to the GPUs is needed as well.

Micron BAM graphic.