NVIDIA Accelerates Microsoft’s Open Phi-3 Mini Language Models

From Nvidia: 2024-04-23 13:08:23

NVIDIA accelerates Microsoft’s Phi-3 Mini open language model with NVIDIA TensorRT-LLM, optimized for running on NVIDIA GPUs. The model has 3.8 billion parameters and two variants supporting 4k and 128K tokens. Developers can try the model at ai.nvidia.com for research and commercial usage, enabling more relevant responses from the model.

Phi-3 Mini is efficient for edge devices, with 3.8 billion parameters allowing high accuracy in responses. It can outperform larger models on language benchmarks while meeting latency requirements. TensorRT-LLM supports the model’s long context window and optimizations like LongRoPE and FP8, improving inference throughput and latency for better performance.

NVIDIA contributes to the open-source ecosystem with over 500 projects under open-source licenses. Collaborations with Microsoft have led to advancements in DirectML acceleration, Azure cloud, generative AI research, and healthcare innovations. Developers can access the TensorRT-LLM implementations on GitHub for optimized inference deployment with the NVIDIA Triton Inference Server.



Read more at Nvidia: NVIDIA Accelerates Microsoft’s Open Phi-3 Mini Language Models