CUDA Installation (NVIDIA GPUs)

If you have an NVIDIA graphics card, you can significantly accelerate model inference by offloading computations to the GPU using CUDA. This is highly recommended for a much faster and smoother experience.

🧠 Key Concepts for Beginners

What is a GPU?

A GPU (Graphics Processing Unit) is a specialized processor designed to handle many small, repetitive tasks simultaneously. While a CPU is like a brilliant mathematician who can solve one complex problem at a time, a GPU is like a thousand students who can all solve simple arithmetic problems at the exact same moment. Large Language Models involve massive amounts of simple math, making them perfect for GPUs.

What is CUDA?

CUDA is a specialized language and platform created by NVIDIA. It allows software (like llama.cpp) to “speak” directly to the GPU, giving it instructions on how to use those thousands of tiny “student” processors to speed up the math.

🛠️ Prerequisites

In addition to the core tools mentioned in the Prerequisites guide, you must have:

NVIDIA GPU: A compatible NVIDIA graphics card.
NVIDIA CUDA Toolkit: This provides the necessary libraries and compilers.
- Download: NVIDIA CUDA Toolkit Downloads.
- Installation: Follow the standard Windows installer.
- Verification: Open a terminal and type:
```
nvcc --version
```

🚀 Step-by-Step Installation

The process is very similar to the CPU installation, but with one critical difference in the CMake configuration step.

1. Clone the Repository

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

2. Create a Build Directory

mkdir build
cd build

3. Configure the Build with CUDA Support

This is the most important step. We tell CMake to enable the CUDA backend by using the -DGGML_CUDA=ON flag.

cmake .. -DGGML_CUDA=ON

4. Compile the Project

cmake --build . --config Release

✅ Verification

Once the build is finished, verify that the executable was built with CUDA support. You can check the output of the help command or, more reliably, look for CUDA-related output when running a model.

To verify the build, run:

./bin/Release/llama-cli.exe --help

To confirm CUDA is actually working: When you run a model (see CLI Usage), look at the initial log output in your terminal. You should see lines indicating that CUDA or cuBLAS is being used, and you should see your GPU being detected.

💡 Troubleshooting

“CUDA not found” error during CMake: This usually means the CUDA Toolkit is not in your system’s PATH. Ensure you have installed the toolkit and restarted your terminal.
Compilation errors related to CUDA: Ensure your NVIDIA driver is up to date. Sometimes, installing the latest driver from NVIDIA’s website fixes compatibility issues with the CUDA Toolkit.

Last Updated: 2026-05-03

ProjectBreakdown-101

Explorer

CUDA Installation - llama.cpp on Windows