Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes

Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes

Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs and  foundation models, such as Llama, Gemma, GPT, and Nemotron, have demonstrated human-like understanding and generative abilities. Thanks to these models, AI developers do not need to go through the expensive and time consuming training process … Read more

NVIDIA Contributes Blackwell Platform Design to Open Hardware Ecosystem, Accelerating AI Infrastructure Innovation

NVIDIA Contributes Blackwell Platform Design to Open Hardware Ecosystem, Accelerating AI Infrastructure Innovation

NVIDIA GB200 NVL72 Design Contributions and NVIDIA Spectrum-X to Help Accelerate Next Industrial Revolution OCP Global Summit—To drive the development of open, efficient and scalable data center technologies, NVIDIA today announced that it has contributed foundational elements of its NVIDIA Blackwell accelerated computing platform design to the Open Compute Project (OCP) and broadened NVIDIA Spectrum-X™ … Read more

Supermicro Launches NVIDIA BlueField-Powered JBOF to Optimize AI Storage

Supermicro Launches NVIDIA BlueField-Powered JBOF to Optimize AI Storage

The growth of AI is driving exponential growth in computing power and a doubling of networking speeds every few years. Less well-known is that it’s also putting new demands on storage.  Training new models typically requires high-bandwidth networked access to petabytes of data, while inference with the latest types of retrieval augmented generation (RAG) requires … Read more