NVIDIA and Hugging Face partnership offers developers inference-as-a-service powered by NVIDIA NIM.

From NVIDIA: 2024-07-29 16:30:41

NVIDIA and Hugging Face partnership will provide 4 million developers with access to NVIDIA-accelerated inference for large language models. New inference-as-a-service capabilities on NVIDIA DGX Cloud will help developers deploy models like Llama 3 and Mistral AI quickly.

Announced at SIGGRAPH, the inference service complements Train on DGX Cloud for AI training. Developers can compare open-source models easily on the Hugging Face Hub and deploy models with optimized performance using NVIDIA NIM.

NVIDIA NIM offers AI microservices for inference, improving token processing efficiency for language models. Models like the 70-billion-parameter Llama 3 can achieve up to 5x higher throughput with NIM on NVIDIA H100 Tensor Core GPU systems.

NVIDIA DGX Cloud provides accessible AI acceleration for generative AI applications, offering scalable GPU resources from prototype to production without long-term infrastructure commitments. Hugging Face’s inference-as-a-service on DGX Cloud with NIM microservices enables experimentation with the latest AI models in an enterprise-grade environment.

At SIGGRAPH, NVIDIA introduced generative AI models and NIM microservices for the OpenUSD framework, accelerating developers’ abilities to build highly accurate virtual worlds. Visit ai.nvidia.com to explore over 100 NVIDIA NIM microservices across various industries.



Read more at NVIDIA: Hugging Face Offers Developers Inference-as-a-Service Powered by NVIDIA NIM