Samsung shipping fast and small PCIe Gen5 bus 4TB mini-gumstick drive
Samsung announced its PM9E1 gumstick drive in October 2024 with 512 GB, 1, and 2 TB capacities. It’s now shipping a 4 TB version saying it is a good fit for for Nvidia DGX Spark AI desktop workstations.
This is an M.2 2242, dual-sided, drive optimized for space-constrained AI PCs and high-performance laptops. It’s built using 1Tb dies made from its Gen 8 V-NAND with 236 layers in TLC (3bits/cell) format, on-board DRAM on both sides, along with a NAND chip, a pseudo-SLC cache, and 4 lanes of PCIe Gen5. It delivers up to 2 million random read IPOS and 2.64 million random write ones with 14.5 GBPs sequential read bandwidth and 12.6 GBps write bandwidth.
Samsung says it follows on from its earlier, 2021 period, PM9A1, a PCIe Gen4 drive in M.2 2280 format, with 2 TB capacity using 128-layer Gen 6 V-NAND, also with TLC tech. That delivered 1 million random read IOPS, 850K write IOPS, and its sequential read/write speeds were 7/5.2 GBps; considerably slower as the Gen4 PCIe bus is half the speed of the PCIe Gen5 bus.
This 4TB PM9E1 with its M.2 2242 format is more compact than the M.2 2280 format PM9E1 drives with 512 GB, 1 and 2 TB capacities. A Samsung tech blog quotes Task Leader Sunki Yun saying: “We decided to apply a smaller NAND package–originally planned for the V9 generation– to V8 as well. Thanks to close collaboration with the Package Development Team, we were able to accelerate development and move the project forward quickly. Because of the strict space limitations, we had to place all the components including NAND, DRAM, and the controller on both sides of a very small PCB. This brought additional challenges in PCB layout, BOM optimization, thermal management, and mechanical reliability.”
According to Samsung the drive has Device Authentication and Firmware Tampering Attestation security features via the v1.2 Security Protocol and Data Model (SPDM). It has an in-house ‘Presto’ controller design, built on 5nm Samsung Foundry technology with firmware optimized for DGX Spark OS software, Nvidia CUDA software, and the overall AI user experience.
There are no details of the optimizations available, but the tech blog quotes Alex Choi of Samsung’s Memory Products Planning team on the topic: “When running large language models (LLMs), one of the most representative AI workloads, systems must handle extremely intensive data loading and training operations. During training in particular, checkpoint operations—which frequently save the model’s state—require very high sequential write performance.”
In essence the sequential read and write speeds are the key, with Choi saying: “[These] are crucial for quickly loading trained models and enabling fast, seamless inference and retraining.”
A Samsung blog discusses the 4 TB PM9E1 in a little more detail.