Ollama (Download and run large language models locally)

Ollama is an application which lets you run large language models
offline.

Optional dependencies like CUDA or ROCm will be automatically detected
during compilation of ollama libraries, if present.

CUDA=ON: building with CUDA, default is CUDA=OFF, build and runtime
dependencies:
  development/cudatoolkit_13
  system/nvidia-driver or system/nvidia-legacy580-driver

ROCM=ON: building with ROCm, default is ROCM=OFF, build and runtime
dependencies:
  development/rocmtoolkit_7

Building ollama requires network and google-go-lang.

To verify the installation:

$ nohup ollama serve &
$ ollama --version
$ ollama run gemma3:270m

See also:
https://docs.ollama.com/faq
https://docs.ollama.com/quickstart