Categories
Misc

New Scaling Algorithm and Initialization with NVIDIA Collective Communications Library 2.23

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL…

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL is a central piece of software for multi-GPU deep learning training. It handles any kind of inter-GPU communication, be it over PCI, NVLink, or networking. It uses advanced topology detection, optimized communication graphs…

Source

Categories
Misc

Just Released: NVIDIA cuDNN 9.7

Bringing support for NVIDIA Blackwell architecture across data center and GeForce products, NVIDIA cuDNN 9.7 delivers speedups of up to 84% for FP8 Flash…

Bringing support for NVIDIA Blackwell architecture across data center and GeForce products, NVIDIA cuDNN 9.7 delivers speedups of up to 84% for FP8 Flash Attention operations and optimized GEMM capabilities with advanced fusion support to accelerate deep learning workloads.

Source

Categories
Misc

Just Released: CUTLASS 3.8

CUTLASS 3.8 extends support to NVIDIA Blackwell SM100 architecture with 99% peak performance for Tensor Core operations, bringing essential features like Mixed…

CUTLASS 3.8 extends support to NVIDIA Blackwell SM100 architecture with 99% peak performance for Tensor Core operations, bringing essential features like Mixed Input GEMMs for efficient model quantization and Grouped GEMM capabilities that accelerate MoE models through parallel expert computation.

Source

Categories
Misc

Accelerate DeepSeek Reasoning Models With NVIDIA GeForce RTX 50 Series AI PCs

The recently released DeepSeek-R1 model family has brought a new wave of excitement to the AI community, allowing enthusiasts and developers to run state-of-the-art reasoning models with problem-solving, math and code capabilities, all from the privacy of local PCs. With up to 3,352 trillion operations per second of AI horsepower, NVIDIA GeForce RTX 50 Series
Read Article

Categories
Misc

The AI tools for Art Newsletter – Issue 1

Categories
Misc

DeepSeek-R1 Now Live With NVIDIA NIM

DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus and search methods to generate the best answer. Performing this sequence of inference passes — using reason to arrive at the best answer — is known as
Read Article

Categories
Misc

New NVIDIA AI Blueprint: Build a Customizable RAG Pipeline

Connect AI applications to enterprise data using embedding and reranking models for information retrieval.

Connect AI applications to enterprise data using embedding and reranking models for information retrieval.

Source

Categories
Misc

Build Apps with Neural Rendering Using NVIDIA Nsight Developer Tools on GeForce RTX 50 Series GPUs

The next generation of NVIDIA graphics hardware has arrived. Powered by NVIDIA Blackwell, GeForce RTX 50 Series GPUs deliver groundbreaking new RTX features…

The next generation of NVIDIA graphics hardware has arrived. Powered by NVIDIA Blackwell, GeForce RTX 50 Series GPUs deliver groundbreaking new RTX features such as DLSS 4 with Multi Frame Generation, and NVIDIA RTX Kit with RTX Mega Geometry and RTX Neural Shaders. NVIDIA RTX Blackwell architecture introduces fifth-generation Tensor Cores to drive AI workloads and fourth-generation RT Cores with…

Source

Categories
Misc

Mastering the cudf.pandas Profiler for GPU Acceleration

Decorative image of a computer monitor with icons floating around it.In the world of Python data science, pandas has long reigned as the go-to library for intuitive data manipulation and analysis. However, as data volumes grow,…Decorative image of a computer monitor with icons floating around it.

In the world of Python data science, pandas has long reigned as the go-to library for intuitive data manipulation and analysis. However, as data volumes grow, CPU-bound pandas workflows can become a bottleneck. That’s where cuDF and its pandas accelerator mode, , step in. This mode accelerates operations with GPUs whenever possible, seamlessly falling back to the CPU for unsupported…

Source

Categories
Misc

How to Use OpenUSD

Image of an autonomous mobile robot on a factory floor in a digital twin screenshot.Universal Scene Description (OpenUSD) is an open, extensible framework and ecosystem with APIs for composing, editing, querying, rendering, collaborating, and…Image of an autonomous mobile robot on a factory floor in a digital twin screenshot.

Universal Scene Description (OpenUSD) is an open, extensible framework and ecosystem with APIs for composing, editing, querying, rendering, collaborating, and simulating within 3D virtual worlds. This post explains how you can start using OpenUSD today with your existing assets and tools and what steps you can take to iteratively up-level your USD workflows. For an interactive…

Source