NVIDIA Blackwell dominates the InferenceMAX v1 benchmarks for AI inference, showcasing superior performance and efficiency. A $5 million investment in NVIDIA GB200 NVL72 can generate $75 million in token revenue, a 15x return on investment. NVIDIA B200 achieves the lowest total cost of ownership with two cents per million tokens on gpt-oss.

InferenceMAX v1 benchmarks highlight NVIDIA Blackwell’s leadership in AI inference, emphasizing efficiency and economics at scale. NVIDIA’s full-stack approach delivers unmatched performance and efficiency for AI factories. Collaboration with OpenAI, Meta, and DeepSeek AI showcases advancements in reasoning and efficiency for AI models.

NVIDIA’s software optimizations, including TensorRT LLM v1.0 release, significantly enhance performance for large AI models like gpt-oss-120b. Introducing speculative decoding in the gpt-oss-120b-Eagle3-v2 model triples throughput and boosts per-GPU speeds. Blackwell sets a new performance standard in InferenceMAX v1 benchmarks for dense AI models like Llama 3.3 70B.

Blackwell’s efficiency metrics like tokens per watt and cost per million tokens drive value for AI factories. With 10x throughput per megawatt compared to the previous generation, Blackwell leads to higher token revenue. Lowering cost per million tokens by 15x versus the previous generation fosters wider AI deployment and innovation.

InferenceMAX uses the Pareto frontier to map performance, showcasing Blackwell’s balance between cost, energy efficiency, throughput, and responsiveness. Blackwell’s full-stack design delivers efficiency and value in production scenarios. Extreme hardware-software codesign, annual hardware cadence, and continuous software optimization contribute to Blackwell’s leadership in AI inference.

AI is transitioning from pilots to AI factories, where data is transformed into tokens and decisions in real time. Open benchmarks like InferenceMAX help teams make informed platform choices and optimize for cost per token and latency service-level agreements. NVIDIA’s Think SMART framework guides enterprises in maximizing ROI with their full-stack inference platform.

Read more at NVIDIA: NVIDIA Blackwell Raises Bar in New InferenceMAX Benchmarks, Delivering Unmatched Performance and Efficiency