NVIDIA Releases Small Language Model With State-of-the-Art Accuracy

From Nvidia: 2024-08-21 12:00:55

NVIDIA has released the Mistral-NeMo-Minitron 8B language model, offering state-of-the-art accuracy in a compact form optimized for AI applications. The model, a smaller version of the Mistral NeMo 12B model, is suitable for chatbots, virtual assistants, and educational tools and can run on NVIDIA RTX workstations.

The model, distilling 12 billion parameters into 8 billion, delivers comparable accuracy to the original model at a lower computational cost. Small language models like Minitron 8B can run in real time on workstations and laptops, making it easier for organizations to deploy generative AI capabilities across their infrastructure.

Mistral-NeMo-Minitron 8B excels on nine popular benchmarks for language models, offering low latency and high throughput when packaged as an NVIDIA NIM microservice. Developers using NVIDIA AI Foundry can customize and optimize smaller versions of the model for enterprise-specific applications.

The team at NVIDIA achieved high accuracy with a smaller model using a combination of pruning and distillation techniques. Pruning removes unnecessary model weights, while distillation retrains the pruned model on a small dataset to boost accuracy. This process saves compute costs and training data compared to training a smaller model from scratch.

In addition to Mistral-NeMo-Minitron 8B, NVIDIA has also released Nemotron-Mini-4B-Instruct, another small language model optimized for low memory usage and fast response times on NVIDIA GeForce RTX AI PCs and laptops. Both models are available as NVIDIA NIM microservices for cloud and on-device deployment and are part of the NVIDIA ACE suite of digital human technologies powered by generative AI.



Read more at Nvidia: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy