NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf
From NVIDIA: [published_date]
NVIDIA has set a new record for the fastest platform for inference in generative AI, according to industry-standard tests. Using the NVIDIA TensorRT-LLM software on Hopper architecture GPUs, performance was boosted nearly 3x over previous results, showcasing the power of NVIDIA’s full-stack platform.
The latest MLPerf benchmarks have shown that TensorRT-LLM on NVIDIA H200 Tensor Core GPUs delivered the fastest performance in generative AI inference tests. These GPUs are equipped with memory-enhanced features and can handle the workload of state-of-the-art large language models with remarkable efficiency.
NVIDIA is launching the H200 GPUs, which offer significant memory enhancements, with 141GB of HBM3e memory running at 4.8TB/s. These GPUs can run large language models with high throughput, simplifying the process and enhancing performance for inference tasks.
The NVIDIA GH200 Superchips combine Hopper architecture GPUs with NVIDIA Grace CPUs on one module, offering up to 624GB of memory, including 144GB of HBM3e memory. These superchips have demonstrated standout performance, especially in memory-intensive MLPerf tests, showcasing their capabilities.
In the latest MLPerf industry benchmarks, NVIDIA’s Hopper GPUs outperformed every test of AI inference, including popular workloads like generative AI, recommendation systems, natural language processing, and more. These performance gains can translate into lower costs for inference tasks across a wide range of applications.
NVIDIA showcased innovative techniques in the MLPerf benchmarks, such as structured sparsity, pruning, and DeepCache optimization, on H100 Tensor Core GPUs. These techniques delivered significant speedups for inference tasks, highlighting NVIDIA’s commitment to advancing AI methods and technologies.
MLPerf tests are transparent and objective, allowing users to make informed decisions when evaluating AI systems and services. NVIDIA’s partners, including leading companies like ASUS, Dell Technologies, Google, and Microsoft Azure, participate in MLPerf to showcase the performance of the NVIDIA AI platform for customers.
As the use cases for generative AI expand, NVIDIA continues to push the boundaries with new architectures and technologies. The upcoming NVIDIA Blackwell architecture GPUs are expected to deliver new levels of performance required for multitrillion-parameter AI models, demonstrating NVIDIA’s dedication to innovation in the AI space.
Read more at NVIDIA: NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf