NVIDIA today reported revenue for the second quarter ended July 28, 2024, of $30.0 billion, up 15% from the previous quarter and up 122% from a year ago.
Month: August 2024
Trillions of PDF files are generated every year, each file likely consisting of multiple pages filled with various content types, including text, images,…
Trillions of PDF files are generated every year, each file likely consisting of multiple pages filled with various content types, including text, images, charts, and tables. This goldmine of data can only be used as quickly as humans can read and understand it. But with generative AI and retrieval-augmented generation (RAG), this untapped data can be used to uncover business insights that…
Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a…
Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a highly optimized inference engine are required for high-throughput, low-latency inference. MLPerf Inference v4.1 is the latest version of the popular and widely recognized MLPerf Inference benchmarks, developed by the MLCommons…
Six years ago, we embarked on a journey to develop an AI inference serving solution specifically designed for high-throughput and time-sensitive production use…
Six years ago, we embarked on a journey to develop an AI inference serving solution specifically designed for high-throughput and time-sensitive production use cases from the ground up. At that time, ML developers were deploying bespoke, framework-specific AI solutions, which were driving up their operational costs and not meeting their latency and throughput service level agreements.
NVIDIA TAO is a framework designed to simplify and accelerate the development and deployment of AI models. It enables you to use pretrained models, fine-tune…
NVIDIA TAO is a framework designed to simplify and accelerate the development and deployment of AI models. It enables you to use pretrained models, fine-tune them with your own data, and optimize the models for specific use cases without needing deep AI expertise. TAO integrates seamlessly with the NVIDIA hardware and software ecosystem, providing tools for efficient AI model training…
As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data
Read Article
Today’s large language models (LLMs) achieve unprecedented results across many use cases. Yet, application developers often need to customize and tune these…
Today’s large language models (LLMs) achieve unprecedented results across many use cases. Yet, application developers often need to customize and tune these models to work specifically for their use cases, due to the general nature of foundation models. Full fine-tuning requires a large amount of data and compute infrastructure, resulting in model weights being updated.
As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that…
As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that real-time generative AI applications demand. Performance depends both on the ability for the combined GPUs to process requests as “one mighty GPU” with ultra-fast GPU-to-GPU communication and advanced software able to take full…
Large language models are driving some of the most exciting developments in AI with their ability to quickly understand, summarize and generate text-based content.
This post is the third in a series on building multi-camera tracking vision AI applications. We introduce the overall end-to-end workflow and fine-tuning…
This post is the third in a series on building multi-camera tracking vision AI applications. We introduce the overall end-to-end workflow and fine-tuning process to enhance system accuracy in the first part and second part. NVIDIA Metropolis is an application framework and set of developer tools that leverages AI for visual data analysis across industries. Its multi-camera tracking reference…