HPE is building two new supercomputers for the Oak Ridge National Laboratory (ORNL); Discovery to succeed Frontier, which will use HPE’s GX5000 Cray exascale supercomputer, for the converged AI and high-performance computing (HPC) era, with a K3000 DAOS storage option, and a Lux AI system.
Discovery will use the GX5000 supercomputer for for physics-based modeling, simulation, data-driven AI models, and testbed capabilities for quantum computing. It will have both the DAOS-based K3000 storage system and the Lustre-based E2000. HPE says it features state-of-the-art capabilities across CPUs, GPUs, accelerators, networking, software, storage and liquid cooling. ORNL currently operates the HPE-built exascale EX2000-based Frontier supercomputer which uses the Cray EX architecture and Clusterstor E1000 Lustre parallel filesystem. Frontier is ranked #2 globally on the TOP500 list (as of June 2025) after being overtaken by Lawrence Livermore’s El Capitan. The all-flash GX5000 is more compact than the current Cray EX4000, needing 25 percent less data center space per rack, and it uses HPE’s latest generation Slingshot interconnect, Slingshot 400 with 400 Gbps line speed from its 51.2 Tbps switch ASIC, double Slingshot 200 speed. The EX4000 is larger and more powerful, than the EX2000 variant used by ORNL.
HPE President and CEO Antonio Neri said: “When we built Frontier for Oak Ridge National Laboratory and ushered in exascale, we achieved the pinnacle in supercomputing history and a triumph for the U.S. We are proud to build on that leadership innovation and strong public-private partnership with the U.S. Department of Energy, ORNL and AMD, to build Discovery and Lux, accelerating the next era of scientific discovery and AI innovation.”
Lux will be a dedicated and multi-tenant AI system based on the direct liquid-cooled ProLiant Compute XD685 and feature AMD Instinct MI355X GPUs, EPYC CPUs and Pensando networking, to provide researchers across the U.S. with cloud-like access to a sovereign AI factory for training and inference.
Bronson Messer, Director of Science for the Oak Ridge Leadership Computing Facility, said: “We expect both systems will contribute to a paradigm shift in our productivity, reaching unparalleled gains in various, critical areas of scientific research and leadership.”
The GX5000 delivers up to 75 million IOPS per fully-populated rack, which compares to the 54 million delivered by a Cray E2000 rack with 18 all-flash SSUs (Scalable Storage Units) inside it; 39 percent higher, and the 18 million IOPS pumped out by Frontier’s E1000 storage subsystem. It also has a 25 percent smaller rack footprint than the E4000.
The K3000 is the first ever factory-built DAOS (Distributed Asynchronous Object Storage) storage system and complements the existing E2000, the Lustre file system-based HPE Cray Supercomputing Storage System, which will also be featured in Discovery. HPE says DAOS-based storage systems are ranked #1 (Aurora at Argonne National Laboratory) and #2 (SuperMUC at Leibniz Supercomputing Center) on the global IO500 storage benchmark and together have four times the storage benchmark score than the next 30 storage systems.
The E2000 system architecture has four main elements: System Management Unit (SMU), Metadata Units, Data units (storage nodes) and Expansion Units. We’ve diagrammed the setup;
The K3000 DAOS rig is quite different, as we understand it, and we’ve diagrammed that as well;
Overall, the DAOS K3000 configuration looks to be much simpler, as well as much faster, than the Lustre E2000 configuration.
A DAOS Storage Engine (storage node) is a 1 RU HPE ProLiant DL360 Gen12 server with 20 EDSFF bays for NVMe RI E3.S SSDs. There can be up to 40 of these in a K3000 DAOS rack.
The capacity of a K3000 rack depends upon the number of storage nodes in the rack, with a minimum configuration of 4 storage nodes and a maximum of 40 (with a rear door heat exchanger). The average config is 20 storage nodes. The max raw capacity is 12.32 PB for 40 storage nodes with 20 SSDs (15.4TB capacity each). The usable capacity depends on the data protection/redundancy layout chosen for the specific customer situation.
There are four DAOS storage node densities to chose from with three performance-optimized configurations:
- K3000 8EDSFF SSU (Server Storage Unit) Controller (8 SSDs per node)
- K3000 12EDSFF SSU Controller (12 SSDs per node)
- K3000 16EDSFF SSU Controller (16 SSDs per node)
There is a single capacity-optimized config with a K3000 20EDSFF SSU controller and 20 SSDs per node, and there are three E3.S SSD capacities available: 3.84, 7.68 and 15.4 TB. We understand these are PCIe gen 4 drives.
Johan Lombardi, an HPE Senior Distinguished Technologist, will discuss the upcoming DAOS 2.8 version at the DAOS User Group meeting on Nov 16, adjacent to the Supercomputing 2025 event.
Bootnote
A description of DAOS’ storage technology can be found here.
Discovery is expected to be delivered in 2028 and be ready for operations in 2029