Nvidia SCADA offloads storage control path to the GPU

Published tue 25 Nov 2025 // 11:44 UTC

The Nvidia SCADA scheme is ushering in GPU-controlled storage IO for AI inferencing workloads that will be faster than GPUDirect for small block transfers.

SCADA is an Nvidia term in a “Storage-Next” architecture. It stands for Scaled Accelerated Data Access and is a storage data IO scheme in which the GPUs in a GPU server directly initiate and control storage IO. This contrasts with GPUDirect, Nvidia’s existing protocol for speeding storage IO. Originally, GPUs were treated as ancillary accelerators by an x86 server that controlled the flow of data to and from them. It owned both the control path and the data path for the IO. GPUDirect took the data path away from the x86 CPU and enabled direct GPU memory-to-storage data transfer using RDMA to NVMe drives. The CPU still owned the control path. SCADA takes the control path away from the CPU as well.

BANDF AD

AI training typically needs bulk data transfers and the control path time of the transfer is comparatively small. AI inferencing needs small block IOs, less than 4 KB, and the control path time of each transfer is relatively large. Nvidia research, downloadable here, found that having GPUs initiate such transfers would take less time and speed inferencing. SCADA is the result and an Nvidia FMS 2025 paper, “Advancing Memory and Storage Architectures for Next-Gen AI Workloads,” discusses it.

Nvidia is working with storage ecosystem partners to productize SCADA-using SSDs and controllers. Marvell makes SSD controllers, and a blog by Chander Chadha, its Director of Marketing for Flash Storage Products, says: “The AI infrastructure need is prompting storage companies to develop SSDs, controllers, NAND and other technologies fine-tuned to support GPUs – with an emphasis on higher IOPS (input/output operations per second) for AI inference – that will be fundamentally different from those for CPU-connected drives where latency and capacity are the bigger focus points.”

Chadha says: “The GPU initiates storage transactions within the SCADA framework which is built around memory semantics,” meaning load and store requests to which the SSD controller has to respond.

He says current SSDs cannot respond fast enough, in IOPS terms, “for data sets smaller than 4KB which results in an underutilized PCIe bus, leading to the GPU starving for data and wasting cycles.” The GPUs could need such data to sustain more than 1,000 parallel threads in inferencing workloads. AI training with CPU-initiated transfers needs fewer. Chadha says: “The number of GPU parallel threads is much lower – tens versus thousands – and data sets are larger in size.”

BANDF AD

Faster PCIe buses, such as PCIe 6 and 7, will help, but SSD controllers also need updating with SCADA accelerator functions and “optimal error correction schemes for smaller payloads.”

Marvell view of CPU and GPU-initiated storage workloads

Chadha sees SSDs emerging with controllers that can handle both types of workload, “capable of handling both PCIe and Ethernet traffic.” We should also, he says, “expect to see future work on interfacing with high bandwidth flash memory or CXL networks.”

Micron

NAND and SSD supplier Micron is also active in SCADA development. It has a PCIe Gen 6 SSD, the 9650, with “optimization for small-block operations.” The 7.68 TB model delivers up to 5.4 million random read IOPS. Micron demonstrated 44 of them delivering 230 million IOPS using the SCADA programming model at SC25.

BANDF AD

The setup used these SSDs connected to Broadcom PEX90000 PCIe Gen 6 switches inside an H3 Platform Falcon 6048 PCIe Gen 6 server. This contained three Nvidia H100 PCIe Gen 5 GPUs.

Nvidia SCADA offloads storage control path to the GPU

Micron says the system “demonstrates linear scaling from 1 to 44 SSDs.” We see that the demo’s 230 million maximum IOPS number is quite close to the theoretical maximum of 44 drives’ aggregated individual 5.4 million random read IOPS; 237.6 million.

It concludes: “Combined with PCIe Gen6, high-performance SSDs, this [SCADA] architecture enables real-time data access for workloads like vector databases, graph neural networks and large-scale inference pipelines.”

Bootnote

BANDF AD

The SCADA acronym has traditionally been used for Supervisory Control and Data Acquisition, referring to the telemetry world. Nvidia’s usage is different but analogous.

Nvidia SCADA offloads storage control path to the GPU

Micron

Bootnote

Xinnor's alternative software RAID filer for AI

Nutanix invited inside Dell’s Private Cloud

StreamFast eSSD and the Open Flash Platform

Simplyblock provides Postgres Git-style branching

IBM refreshes FlashSystem lineup with faster 5600, 7600, and 9600 arrays

Storage news ticker – 9 February 2026

OFP’s data server killers aiming for AI system scalability and efficiency nirvana

StreamFast: Stream arbitrary length data to SSDs with device-assigned addresses and no FTL

VAST Data plans funding round so early stockholders can get cash

Cohesity deepens Google Cloud integration

MinIO plugs Apache Iceberg tables directly into AIStor

DDN appoints vice chairman amid enterprise AI expansion

Intel and Softbank aim ZAM at HBM

Hitachi Vantara may be up for sale

Samsung shipping fast and small PCIe Gen5 bus 4TB mini-gumstick drive