WEKA has made its AI-accelerating Augmented Memory Grid commercially available in the NeuralMesh AI filesystem product and launched two new WEKApod appliances.
It claims this enables customers “to break through the [GPU server] memory wall with major efficiency gains that make AI cost-effective at scale.”
NeuralMesh is WEKA’s parallel filesystem software. Its Augmented Memory Grid enables AI models to extend GPU server memory for inferencing to the Neural Mesh’s external storage, using it as a KV Cache with microsecond latencies and multi-TBPs bandwidth, and providing up to additional petabytes of memory address space capacity. This, WEKA says, can reduce “time-to-first-token by up to 20x” enabling “AI builders to streamline long-context reasoning and agentic AI workflows.”
Liran Zvibel, co-founder and CEO at WEKA, said: “We’re bringing to market a proven solution validated with Oracle Cloud Infrastructure and other leading AI infrastructure platforms. Scaling agentic AI isn’t just about raw compute—it’s about solving the memory wall with intelligent data pathways. Augmented Memory Grid enables customers to run more tokens per GPU, support more concurrent users, and unlock entirely new service models for long-context workloads.
“OCI’s bare metal infrastructure with high-performance RDMA networking and GPUDirect Storage capabilities makes it a unique platform for accelerating inference at scale.”
An 8-node OCI cluster provided 7.5 million read IOPS and 1 million write IOPS in WEKA testing
WEKAPod Appliances
The WEKApod appliance is a storage server appliance storing and pushing data out to GPU servers such as Nvidia’s SuperPOD. It consists of pre-configured hardware + WEKA Data Platform SW storage nodes, and there are two models; PCIe gen 4-using Prime and faster PCIe gen 5-using Nitro.
The new Prime appliance achieves 65 percent better price-performance by intelligently placing data, based on workload characteristics, across mixed flash configurations; high-capacity SSD and high-performance eSSD – both using TLC NAND, lowering costs but not write performance. WEKA calls this its AlloyFlash feature. Prime configs support up to 20 drives in a 1 RU rack or 40 in a 2 RU rack.
This updated Prime has 4.6x better capacity density, 5x better write IOPS per rack unit (versus previous generation), 4x better power density at 23,000 IOPS per kW (or 1.6 PB per kW), and 68 percent less power consumption per terabyte.
WEKA says it “now provides the right level of performance and economics for every AI workload. AI clouds can serve diverse customer needs with configurations purpose-built for each profile—from performance-intensive training to cost-optimized inference.”
The Nitro version doubles performance density with refreshed hardware, and WEKA says this “makes it ideal for large-scale object storage repositories and AI data lakes that demand performance without compromise.” It is “purpose-built for AI factories running hundreds or thousands of GPUs, delivers 2x faster performance and 60 percent better price-performance through upgraded hardware, including Nvidia’s ConnectX-8 SuperNIC, delivering 800 Gbps throughput” with 20 TLC drives in a 1 RU chassis.
Naturally these WEKAPod appliances are a great base on which to deploy the NeuralMesh software. Both are certified as turnkey appliances for Nvidia DGX SuperPOD and Nvidia Cloud Partner use.
WEKA Chief Product Officer Ajay Singh said: “WEKApod Prime delivers 65 percent better price-performance without compromising on speed, while WEKApod Nitro doubles performance to maximize GPU utilization. The result: faster model development, higher inference throughput, and better returns on compute investments that directly impact profitability and time-to-market.”
The next-generation WEKApod appliances are available now. Augmented Memory Grid is now available as an optional feature for NeuralMesh deployments, and on the Oracle Cloud Marketplace, with additional cloud platforms to follow. Read more about the Augmented Memory Grid here.