NVIDIA Blackwell Sets New Standard for Gen AI in MLPerf Inference Debut
From NVIDIA: 2024-08-28 11:00:11
Enterprises are facing increased demands on data center infrastructure as they adopt generative AI. In the latest MLPerf Inference v4.1 benchmarks, NVIDIA platforms outperformed competitors, with the upcoming NVIDIA Blackwell platform providing up to 4x more performance compared to the H100 Tensor Core GPU. The H200 Tensor Core GPU excelled in all data center benchmarks, showcasing impressive results on the Mixtral 8x7B MoE LLM workload.
Models of Experts (MoE) are becoming popular for LLM deployments due to their versatility and efficiency. Multi-GPU compute, such as NVIDIA NVLink and NVSwitch, is essential to meet real-time latency requirements for serving LLMs. NVIDIA partners also made strong MLPerf Inference submissions, emphasizing the wide availability of NVIDIA platforms. The continuous software development of NVIDIA platforms leads to performance and feature improvements on a regular basis.
The NVIDIA H200 GPU delivered up to 27% more generative AI inference performance in the latest round. Triton Inference Server, part of the NVIDIA AI platform, offers significant performance gains and helps organizations consolidate inference servers. Deploying generative AI models at the edge, such as on the NVIDIA Jetson platform, can provide real-time actionable insights from sensor data.
NVIDIA Jetson AGX Orin system-on-modules achieved considerable throughput and latency improvements in the latest MLPerf benchmarks, showcasing the platform’s versatility for edge AI. Overall, NVIDIA platforms demonstrated performance leadership across data center and edge applications in the MLPerf Inference tests. NVIDIA H200 GPU-powered systems are already available from various providers.
Read more at NVIDIA: NVIDIA Blackwell Sets New Standard for Gen AI in MLPerf Inference Debut