Nvidia GTC storage news roundup
Nvidia's 2026 GTC has seen a blizzard of storage-related announcements. Here are ones from Cisco, Hammerspace, Kioxia, MinIO, NetApp, Nvidia itself, Samsung, Supermicro, VAST Data, VDURA and WEKA.
…
Cisco has a new validated design for the VAST AI OS on Cisco AI PODs. This design validates the Cisco EBox running VAST AI OS as the shared data foundation for Cisco AI PODs. Built on Cisco UCS, the EBox provides an all-flash data plane for VAST AI OS. Cisco CNode-X brings GPU-accelerated data services directly to that foundation. For example, the VAST InsightEngine on Cisco UCS C845A M8 approach to bring vector search, accelerated analytics, event-driven services, and richer AI data pipelines directly into the platform. To learn more about Cisco’s approach to AI infrastructure validation, visit the Cisco Validated website and look under Design guides for AI-ready infrastructure.
...
Hammerspace announced general availability of its new turnkey AI Data Platform (AIDP) offering which uniquely uses data in place, eliminating the need to purchase massive amounts of new flash just to house AI data. The Hammerspace AIDP is built on Nvidia's reference design, including RTX PRO 6000 and RTX PRO 4500 Blackwell Server Edition GPUs, NIM microservices and NeMo Retriever. Hammerspace converges data management with data orchestration across heterogeneous storage to simplify and automate the data pipeline and deliver the security, governance and content indexing required for high-performance inference, retrieval-augmented generation (RAG) and agentic AI.
David Flynn, Hammerspace Founder and CEO, said: "Hammerspace is the only AI Data Platform that can access data anywhere across edge devices, data centers and clouds, across high-performance file and object storage, without forcing enterprises into a copy-first AI silo. We overcome data gravity by continuously identifying the data that matters, orchestrating it efficiently to GPUs, and enabling processing where it’s most optimal, whether that’s local GPU resources near the data or centralized GPUs at scale.”
The Hammerspace AIDP is integrated with Secuvy’s Data Security Posture Management (DSPM) technology.
Find out more in a Hammerspace blog.
...
Kioxia is demoing an evaluation of its forthcoming super high IOPS GP Series SSD at GTC, which will be built with its XL-Flash storage-class memory as part of its participation in Nvidia’s Storage-Next initiative. The GP Series will be able to indirectly access high-speed flash memory as an expansion to High Bandwidth Memory (HBM) in AI systems. It will provide larger GPU-accessible memory capacity for faster data access to AI workloads.
The GP Series will deliver higher IOPS, finer-grained data access (512 bytes), and lower power consumption per IO, compared with Kioxia conventional TLC SSDs. It will connect to other, lower tier SSDs and to a GPU server's CPU system by PCIe gen 5. its capacity will be 800 GB of SLC flash and 1,600 GB of MLC (2bits/cell) flash, and it will come in an E1.S or E3.S form factor. Evaluation samples of KIOXIA GP Series will be available to select customers by the end of 2026.
Kioxia says its CM9 Series PCIe 5.0 E3.S SSD, offering 25.6 TB TLC capacity with 3 DWPD endurance, provides the performance, capacity, and endurance needed to support KV Cache extension workloads in Nvidia’s Context Memory Storage (CMX) scheme.
…
Kioxia demonstrated high-dimensional vector search scaling to 4.8 billion vectors on a single query server in a Milvus vectorDB environment powered by GPU acceleration, using its open-source AiSAQ approximate nearest neighbor search (ANNS) technology.
It also demonstrated a significant reduction in index build time by leveraging GPU acceleration through Nvidia cuVS, with up to 20x improvement in AiSAQ index build time for high-dimensional vectors of 1024 dimensions, and up to 7.8x improvement in end-to-end build times. This 20x improvement represents a reduction from 28.4 days using CPU to 1.4 days using four Hopper GPUs to build the index, and a reduction from 31 days to 4 days in end-to-end testing.
Read more in an interesting Kioxia blog for in-depth background on the index building system and testing
…
MinIO announced its AIStor will support object data stores for Nvidia's new STX reference architecture, powered by Vera Rubin, the BlueField-4 DPU and Spectrum-X Ethernet networking. STX provides a rack-scale blueprint for enterprise AI systems built on Vera Rubin GPUs, BlueField-4 DPUs and Spectrum-X networking.
AIStor runs natively inside the BlueField-4 DPU, which means, instead of operating on separate storage servers, AIStor runs directly within the networking layer of the system, placing storage directly in the AI data path, getting data to the GPUs faster. This scheme reduces the storage layers that can slow AI workloads. It eliminates the need for dedicated storage servers, making storage part of the AI fabric, and enabling fast, continuous access to context data required for RAG pipelines and agentic systems.
A Minio blog provides more information.
...
NetApp is launching the NetApp AI Data Engine (AIDE), a secure, unified AI data platform stack co-engineered with Nvidia and integrated with its AI Data Platform reference design. NetApp's AIDE automatically creates and continuously updates a global metadata catalog with powerful search capabilities. This catalog goes beyond standard file system metadata, actively analyzing file content to semantically enrich metadata in place, rather than moving the data multiple times. The AIDE metadata enables enterprises to find, curate, use, and govern data to use the correct and most up-to-date data throughout the AI data pipeline—from selection through transformation, retrieval and serving to AI applications and agents.
Customers will be able to use NetApp AIDE across their data estate with the new RTX PRO 4500 Blackwell Server Edition GPUs announced today at Nvidia GTC and RTX PRO 6000 Blackwell Server Edition GPUs. NetApp AIDE will also quickly support deployments directly into new and existing NetApp storage environments—including AFF A-Series, AFF C-Series, and FAS.
NetApp AIDE will be launching this month for an initial wave of customers and partners, with broad availability coming in the early summer. Over the next few months, NetApp AIDE will extend to support an increasing number of deployment options, giving customers broad infrastructure flexibility.
Its AIDE will include new multimodal data capabilities to include visual data. Agentic AI support will enable seamless, secure, governed agentic workflows on enterprise data across a global NetApp data estate and supporting popular, industry standard protocols.
The company will deliver integrations across a number of ISV partners, both on-premises and in the cloud, that will be available very soon and include AI app development platforms and frameworks built on hyperscale cloud services such as Azure-based AI applications, Google Cloud’s Vertex AI platform, and LangChain.
NetApp will support Nvidia STX, a modular, rack-scale storage reference architecture for agentic AI. STX is built with Vera Rubin GPUs and BlueField-4 DPUs, and will deliver a high-performance data engine with a specialized memory tier for KV Cache storage.
Read more about NetApp AIDE here.
...
New generation NetApp EF-Series all-flash EF50 and EF80 models add 250 percent more read and write bandwidth to the existing EF300 and EF600 arrays. The EF- and E-series run the Santricity OS, not ONTAP, and can be viewed as stripped-down, go-faster, NetApp storage hardware.
The EF systems are faster than the hybrid SSD/HDD E-Series, and there are four existing EF systems;
- EF300C with 550,000 IOPS, 20 GBps read, 9 GBps write
- EF300 with 670,000 IOPS, 20 GBPs read, 9 GBPs write
- EF600C with 1 million IOPS, 44 GBPs read, 13 GBps write
- EF600 with 2 million IOPS, 44 GBps read, 13 GBps write
NetApp says the EF50 and EF80 deliver 100 GBps read and 57 GBps write bandwidth, and provide up to 1.5PB of storage in 2U, the same as the EF300C and EF600C.
It does not provide IOPS numbers, despite Sandeep Singh, SVP and GM of Enterprise Storage at NetApp, saying: “Data is the key component to delivering business value for enterprises, underpinning performance hungry workloads like AI or databases. … With the new EF-Series systems, purpose-built for extreme performance, we’re enabling customers to deploy and scale high-throughput, low-latency workloads quickly and efficiently, while reducing data center footprint and operational overhead.”
Setting aside its Santricity OS, NetApp says “Coupled with high-performance parallel file systems like Lustre or BeeGFS, the new EF50 and EF80 systems accelerate HPC simulations and keep GPUs fully utilized with high-performance scratch space”
NetApp says customers have deployed more than 1 million EF-Series systems. There could then, conceivably, be more E-Series installations than ONTAP installations, which would surprise most of us.
...
Nvidia’s BlueField-4 STX is a modular reference architecture that enables enterprises, cloud and AI providers to deploy accelerated storage infrastructure capable of the long-context reasoning required for agentic AI.
The idea is to accelerate data delivery to GPUs from storage by front-ending the storage drives with the BlueField-4 (BF-4) DPU which accelerates storage IO and helps support KV Cache extension to BF-4-fronted SSDs. This, in turn, supports longer context reasoning which needs more tokens, represented as vector embeddings, to be held in the cache rather than be recomputed or, worse, fetched from external storage without BF-4 assistance. Without this BF-4 assistance and KV Cache extension AI query and agent responsiveness would both suffer and GPUs could endue more idle time.
Jensen Huang, Nvidia CEO and founder, said: “Agentic AI is redefining what software can do — and the computing infrastructure behind it must be reinvented to keep pace. AI systems that reason across massive context and continuously learn require a new class of storage. Nvidia STX reinvents the storage stack, providing a modular foundation for AI-native infrastructure that keeps AI factories operating at peak performance.”
STX is accelerated by the Vera Rubin accelerator and harnesses a new, storage-optimized BF-4 processor that combines the Vera CPU with ConnectX-9 SuperNIC, together with Spectrum-X Ethernet networking, DOCA and Nvidia’s AI Enterprise software.
Nvidia claims its STX architecture also enables 4x higher energy efficiency compared with traditional CPU architectures for high-performance storage, and can ingest 2x more pages per second for enterprise AI data.
The first rack-scale implementation includes the Nvidia’s CMX context memory storage platform, which expands GPU memory with a high-performance context layer for scalable inference and agentic systems, providing up to 5x tokens per second compared with traditional storage.
Storage providers and manufacturing partners are building infrastructure using Nvidia’s STX modular reference designs for agentic AI, including AIC, Cloudian, DDN, Dell Technologies, Everpure, Hitachi Vantara, HPE, IBM, MinIO, NetApp, Nutanix, Supermicro, Quanta Cloud Technology (QCT), VAST Data and WEKA.
Early adopters of STX for context memory storage include CoreWeave, Crusoe, IREN, Lambda, Mistral AI, Nebius, Oracle Cloud Infrastructure (OCI) and Vultr.
STX-based platforms will be available from Nvidia's partners in the second half of this year.
…
Samsung's next-generation HBM4E, delivering 16Gbps per pin and 4.0 terabytes-per-second (TB/s) bandwidth, will be on display for the first time at GTC 2026 - as will be its new sixth-generation HBM4, which is now in mass production and is designed for the Vera Rubin platform. Visitors can see Samsung’s hybrid copper bonding (HCB) technology, that will enable next-generation HBM to achieve 16 or more layers while reducing heat resistance by more than 20 percent, compared to thermal compression bonding (TCB).
As part of the new Nvidia BlueField-4 STX reference architecture for accelerated storage infrastructure in the Vera Rubin platform, Samsung’s PM1753 SSD will show how it helps to enhance energy efficiency and system performance for inference workloads.
...
Supermicro is showcasing the upcoming release of the Supermicro Unified AI Data Platform with VAST, extending its Data Center Building Block Solutions into a unified architecture for enterprise AI. Supermicro brings the validated building blocks. VAST turns them into a single operating environment for data, intelligence, and AI services. Its AI OS provides the shared software foundation for resilient data services, databases, vector search, event-driven pipelines, and AI services.
The Supermicro EBox provides a VAST-certified all-flash foundation for unified file, object, block, and database services, while its CNode-X brings GPU-accelerated services closer to the data for vector search, SQL, retrieval, and agentic workflows.
...
VAST Data’s Foundation Stacks are open-source libraries used to deploy AI pipelines on the VAST AI OS. They provide a production-ready environment for Nvidia’s AI Blueprints, which, on their own, can, VAST says, take most organisations months to run the systems in the real world. Its Foundation Stacks unify data access, database services, compute orchestration, eventing, and pipeline execution into a single environment. This way, VAST enables customers to deploy scalable AI pipelines without building complex infrastructure from scratch.
John Mao, VP, Global Technology Alliances at VAST, said: “With VAST Foundation Stacks, VAST is taking the architectural patterns behind leading Nvidia Blueprints and giving customers a faster path from experimentation to production for scalable AI pipelines, video intelligence, and agentic AI systems.”
Foundation Stacks can be repeatedly deployed anywhere the VAST AI OS runs, including in the cloud as well as on-premises via VAST's newly announced CNode-X platforms, as part of the Nvidia AI Data Platform reference design.
The first Foundation Stacks are based on Nvidia’s AI Blueprints for Video Search and Summarisation (VSS) and AI-Q.
The VSS-based VAST Foundation Stack enables organisations to ingest massive volumes of live or archived video and extract insights through semantic indexing, summarisation, and interactive Q&A, powered by the VAST AI Operating System.
The AI-Q based VAST Foundation Stack provides a foundation for building custom AI researchers that can operate across private, enterprise data sources, synthesising hours of research in minutes, using the VAST AI OS for persistent and secure context, scalable reasoning pipelines, and trusted agent execution.
More industry-focused implementations are planned.
...
And then there were three - VDURA announced the availability of Remote Direct Memory Access (RDMA) capability, the upcoming first phase of its Context-Aware Tiering technology planned for later this year, and optimized infrastructure configurations for the VDURA Data Platform built on AMD EPYC Turin processors and Nvidia ConnectX-7 high-speed networking adapters.
With RDMA, VDURA DirectFlow enables AI clusters to sustain peak throughput without CPU involvement, freeing compute resources for model execution and reducing end-to-end latency across the data pipeline.
Building on RDMA, the first phase of VDURA Context-Aware Tiering dynamically manages data placement across multiple storage tiers based on workload characteristics and access patterns. The initial phase introduces:
- Extended DirectFlow Buffer to Local SSD: Extends the DirectFlow buffer layer to local NVMe SSD, reducing dependency on network storage for hot data and minimizing latency for active AI workloads.
- KVCache Writeback for Persistence SLA: Intelligent writeback of KVCache data ensures only persistence-critical data is written back to durable storage, minimizing unnecessary I/O while maintaining SLA compliance for AI inference pipelines.
- Context Cache Tiering: A unified Context Cache Tiering framework enables seamless, high-speed read and write access across local SSD and DRAM tiers at LMCache speed, supporting AI inference use cases including long-context language model serving and retrieval-augmented generation.
Ken Claffey, CEO of VDURA, said: “RDMA gives AI teams direct, zero-CPU-overhead access to their data. Context-Aware Tiering brings intelligence to every tier of the extended storage hierarchy, so data is always in the right place at the right time. Together, these capabilities enable organizations to run larger models, serve more inference requests, and efficiently scale AI infrastructure with the operational reliability that production AI demands.”
RDMA capability is available now for all V5000 and V7000-class systems running the VDURA Data Platform. Context-Aware Tiering phase 1 should be generally available later this year.
VDURA has a roadmap of additional Context-Aware Tiering capabilities planned through 2027, encompassing deeper application-directed data placement, expanded cross-node cache coherence, and broader hardware support for Nvidia BlueField-4 DPUs.
…
WEKA announced general availability of its enterprise-ready NeuralMesh AI Data Platform (AIDP), which delivers composable, high-performance infrastructure optimized for AI Factory deployments, and based on Nvidia’s AI Data Platform reference design.
Liran Zvibel, cofounder and CEO at WEKA, said: “WEKA’s NeuralMesh AIDP gives organizations everything they need to run always-on AI factories: extreme storage performance and the flexible architecture required to operationalize AI at production scale. Whether an organization is just beginning its AI journey or running full-stack NVIDIA deployments, NeuralMesh AIDP scales seamlessly as they grow.”
NeuralMesh AIDP enables enterprises and AI cloud providers to unify AI operations from retrieval to inference on a single, ready-to-deploy platform. With pre-integrated hardware and software options from Nvidia (including RTX 6000 PRO and RTX 4500 PRO Server Edition GPUs) alongside Red Hat, Spectro Cloud and Supermicro, WEKA says customers can eliminate months of AI integration work.
It delivers ready-to-use pipelines for a spectrum of business use cases that work across verticals, including: Semantic Search, Video Search & Summarization (VSS), AlphaFold for drug discovery, AIQ/Agentic RAG and more.
The NeuralMesh AI Data Platform solution is available now, delivered as an appliance-style system. You can learn more here.
…
WEKA has integrated its NeuralMesh software with Nvidia’s STX reference architecture. WEKA’s Augmented Memory Grid memory extension technology running on NeuralMesh will support STX to bring high-throughput context memory storage to agentic AI factories, making long-context reasoning seamless across sessions, tools, and tasks.
It reckons NeuralMesh/STX will deliver an estimated increase of 4-10x more tokens per second for context memory while supporting at least 320 GB read and 150 GB write throughput per second for AI workloads, more than double the throughput of conventional AI storage platforms.
Liuran Zvibel said: “With coding LLMs advancing, we’re seeing unprecedented adoption of Agentic AI use cases for software engineering, where productivity increases by 100-1000x. As coding assistants make repeated calls against largely unchanged codebases and prompts, WEKA’s Augmented Memory Grid reuses cached context instead of forcing redundant prefill, even as context windows grow to incredible lengths. This provides a major boost in response times and greatly increases the number of concurrent users running on the same infrastructure.”
WEKA’s Augmented Memory Grid is a purpose-built memory extension layer that pools and persists KV cache outside of GPU memory, keeping long-context sessions stable and concurrency high as inference workloads grow. First unveiled at GTC 2025 and generally available to NeuralMesh customers today, Augmented Memory Grid has been validated with Supermicro on Nvidia Grace CPUs and BlueField-3 DPUs.
The Augmented Memory Grid is commercially available with NeuralMesh today.
Get more information about NeuralMesh here and Augmented Memory Grid here.