AI/ML

Qumulo adds autonomous AI management and GPU server data delivery features to Cloud Data Fabric

Published

Qumulo has introduced three new, AI-relevant, software capabilities: Helios AI Agent, Cloud AI Accelerator, and AI Networking, to its Cloud Data Platform (CDP) scale-out filesystem software to make management, AI data selection and delivery autonomous and better.

The concept behind this is that AI can improve the software’s intrinsic ability to monitor and manage its entire data estate infrastructure performance, and selectively cache and deliver data to GPU servers without needing close system admin-level control. Qumulo’s Cloud Data Platform software can run on-premises and, natively, in the AWS, Azure and Google clouds, with a unified global namespace for its unstructured data. This forms a Cloud Data Fabric and the three announcements are part of an AI data supply chain.

Doug Gourlay.

CEO Doug Gourlay said: “Enterprises today need more than just storage—they need systems that think, adapt, and accelerate. Helios gives our customers predictive awareness of their entire data ecosystem, Qumulo Cloud AI Accelerator puts their data into motion wherever insight is needed, and AI Networking redefines what’s possible in performance. This is the foundation for the next generation of reasoning infrastructure.”

Helios is an AI agent fed with system-wide telemetry – on-premises and in the Qumulo instances in the AWS, Azure and Google clouds – that can self-manage, self-diagnose, and self-optimize the Qumulo unstructured data environment. It receives billions of events daily, from the infrastructure’s compute, storage, cloud, and network layers, and puts them in a unified model. Helios looks for, and identifies, emerging anomalies, predicts looming capacity or performance issues, before they occur, and automatically generates prescriptive recommendations or remediation workflows to fix these nascent problems. Think of it as a kind of near-self-driving, super-cruise control for the Qumulo data estate infrastructure.

Helios has MCP support and so extends its reach into Qumulo’s partner ecosystem, “allowing external agents and orchestration frameworks to participate in the same reasoning fabric, creating a truly autonomous data platform.”

The Cloud AI Accelerator is focussed on moving data from Qumulo Cloud Data Fabric stores to a GPU server, using NeuralCache technology to predictively cache and reduce GPU data load times by up to 64 percent. It can be deployed in minutes, Qumulo says, to any major cloud, region, or availability zone, and acts as an ephemeral read/write spoke for the Cloud Data Fabric.

It can be scaled from one instance to hundreds as needed and dynamically optimizes data paths, making sure that data only moves when and where it is needed, with minimal latency and no manual orchestration. Training, inferencing, and reasoning workloads get access to a single source of data truth with, the company says, “strict security, governance, and control of the data.”

Blocks & Files graphic. The data movers will also receive output from the GPU servers and it send it back to the originating Qumulo centre.

AI Networking brings in new data movers tuned for AI training, inferencing and reasoning workloads running on GPU servers. The data movers natively support RDMA (Remote Direct Memory Access), RDMA over Converged Ethernet v2 (RoCEv2), and NFS over RDMA, with S3 over RDMA in development. They “provide near-memory bandwidth between storage and accelerated compute clusters, dramatically reducing latency and CPU overhead for large-scale AI operations.”

These data movers enable “seamless integration with Nvidia DGX, AMD Instinct, and other GPU-rich compute infrastructures.”

Qumulo’s three new capabilities are available in preview for select customers starting today, with general availability across the next quarter. Demonstrations will be given at SC25 at booth number 4407 where Qumulo solutions engineers can provide hands-on advice on both HPC and AI workflows.