The Hao AI Lab at UC San Diego received the NVIDIA DGX B200 system to enhance their work in large language model inference. Many LLM platforms, including NVIDIA Dynamo, utilize concepts from the lab. The DGX B200 accelerates projects like FastVideo and Lmgame-bench, pushing LLMs towards real-time responsiveness.

DistServe introduced disaggregated serving to optimize system throughput and user latency. Goodput, a new metric, balances cost and service quality for LLM-serving engines. By splitting prefill and decode tasks onto different GPUs, developers can achieve optimal goodput and scale workloads efficiently with low latency.

The collaboration between Hao AI Lab and NVIDIA DGX B200 extends to healthcare and biology research at UC San Diego. The system enables innovative AI platforms, accelerating research projects across various departments. NVIDIA Dynamo framework supports scaling disaggregated inference for generative AI models with high efficiency and low cost.

Read more at NVIDIA: UC San Diego Lab Advances Generative AI With NVIDIA DGX B200