Large language models (LLMs) are getting smarter with hundreds of billions of parameters, leading to the need for higher compute performance due to larger models and reasoning capabilities. MLPerf Inference v5.1 benchmark includes models like DeepSeek-R1, Llama 3.1, and Whisper, showcasing NVIDIA’s new Blackwell Ultra architecture with record-breaking performance.
NVIDIA’s Blackwell Ultra architecture offers higher compute capabilities, setting new performance records on benchmarks like DeepSeek-R1. Utilizing NVFP4 acceleration, CUDA Graphs, and new parallelism techniques, NVIDIA maximized performance on models like DeepSeek-R1 and Llama 3.1 405B. Disaggregated serving further boosted inference throughput on large models like Llama 3.1.
NVIDIA’s submission in the MLPerf Inference v5.1 benchmark demonstrates exceptional performance gains using the Blackwell Ultra architecture and advanced inference optimization techniques. Disaggregated serving and technologies like NVFP4 and CUDA Graphs contribute to significantly higher throughput and efficiency in inference tasks, showcasing NVIDIA’s leadership in AI inference.
Read more at Nvidia: NVIDIA Blackwell Ultra Sets New Inference Records in MLPerf Debut