SCADA

Published fri 25 Mar 2022 // 09:53 UTC

SCADA – Scaled Accelerated Data Access – an Nvidia control system architecture AI inferencing IO to its GPUs. In the Nvidia Blackwell GPU environment this is a client-server runtime that runs in GPUs, functioning as a multilevel cache between the PCIe, CPU, and storage and the 100,000+ threads driving random I/Os in the GPU kernel:

It coalesces I/O requests within the GPU and maintains a read-through cache, converting random I/Os into either local cache hits within the GPU or batches of I/Os that are packed together before being passed over PCIe to either local NVMe or a remote SCADA server.
It takes full ownership of NVMe block devices and implements an NVMe driver inside the GPU. This keeps random I/Os from having to be processed on the host CPU.
It enables peer-to-peer PCIe in a way analogous to GPUDirect. This avoids sending I/Os all the way to host memory, and keeps traffic between GPUs and storage local to the PCIe switch they share.

BANDF AD

There are a couple places during LLM inferencing where small, random reads have to happen repeatedly:

KV cache lookups. As the response to an AI LLM like ChatGPT question is being built out word-by-word, the model needs to reference all the previous words in the conversation to decide what comes next. It doesn’t recompute everything from scratch; instead, it looks up cached intermediate results (the key and value vectors) from earlier in the conversation. These lookups involve many small reads from random places each time a new word is generated.
Vector similarity search. When you upload a document to the LLM, the document gets broken into chunks, and each chunk is turned into a vector and stored in a vector index. When you then ask a question, it’s also turned into a vector, and the vector database searches the index to find the most similar chunks—a process that requires comparing the query vector against a bunch of small vectors stored at unpredictable places.

Just as GPUDirect Storage has become essential for efficient bulk data loading during training, SCADA is likely to become an essential part for efficiently inferencing in the presence of a lot of context—as is the case when using both RAG and reasoning tokens.

[Thanks to a Glenn Lockwood blog post.]

BANDF AD

More information: SCADA is a client-server scheme for getting data from a storage server, with SCADA server software, to a GPU server, with SCADA client software, and provides fine-grained and accelerated, GPU-initiated access to stored data. It is a specialized technology framework developed by Nvidia as part of its CUDA ecosystem and was introduced around 2024. It addresses the challenges of handling massive datasets in GPU-accelerated computing environments, particularly for applications where data volumes exceed available memory.

Nvidia notes that feeding large data to GPUs requires support for 100,000 fine-grained GPU accesses to datasets that no longer fit in memory, and securing accesses to the GPU. New apps (GNNs, VectorDB) make fine-grained requests from every GPU thread to more data than can fit in the memory of many nodes. The SCADA programming model avoids painful out-of-memory errors with load/store and leverages NVMe to reduce total cost of ownership.

SCADA enables GPUs to directly and efficiently access large-scale datasets from storage without relying on CPU intermediaries, which traditionally introduce bottlenecks and overhead. It uses GPUDirect Storage extensions to allow up to 100,000 fine-grained GPU threads to pull data directly from storage, bypassing CPU involvement for “speed-of-light” performance. And it automatically scales data and compute resources, making it ideal for scenarios where datasets are too large to fit in GPU memory.

Also SCADA provides a single, unified API for data access that works seamlessly regardless of dataset size or compute cluster scale. This allows users to handle everything from single-node setups (e.g., 10 TB datasets) to distributed clusters without out-of-memory (OOM) errors or major code changes.

BANDF AD

The data path protocol is implemented with DMA over PCIe or RDMA over InfiniBand or Ethernet. The control path protocol is implemented with secure IPC and/or RDMA. There is a new GPU-oriented proprietary protocol to take advantage of GPU parallelism and reduce the number of ‘doorbell rings.’

data management

SCADA

More storage supply chain pain incoming: Oh no, not NOR too

Xinnor's alternative software RAID filer for AI

Nutanix invited inside Dell’s Private Cloud

StreamFast eSSD and the Open Flash Platform

Simplyblock provides Postgres Git-style branching

IBM refreshes FlashSystem lineup with faster 5600, 7600, and 9600 arrays

Storage news ticker – 9 February 2026

OFP’s data server killers aiming for AI system scalability and efficiency nirvana

StreamFast: Stream arbitrary length data to SSDs with device-assigned addresses and no FTL

VAST Data plans funding round so early stockholders can get cash

Cohesity deepens Google Cloud integration

MinIO plugs Apache Iceberg tables directly into AIStor

DDN appoints vice chairman amid enterprise AI expansion

Intel and Softbank aim ZAM at HBM

Hitachi Vantara may be up for sale

Samsung shipping fast and small PCIe Gen5 bus 4TB mini-gumstick drive

Western Digital blows hard disk drive future wide open

Hitachi Vantara warns poor data foundations are stalling AI projects

Commvault pitches Geo Shield for sovereign data protection

Storage news ticker – February 2

Western Digital is more profitable than Seagate

Sandisk turns pricey NAND chips into gold