Setting Up Ollama and Open WebUI with GPU Passthrough on Proxmox

This post is here mostly for me to remember the process on how to set up a complete local AI stack on Proxmox, from GPU passthrough to running my first models. Feel free to give it a try.


Prerequisites

Hardware Requirements:

  • NVIDIA GPU (RTX 3060, 3070, 4060, etc.)
  • CPU with virtualization support (Intel VT-d or AMD-Vi)
  • Sufficient RAM (16GB minimum, 32GB recommended)
  • Proxmox VE installed

What You’ll Need:

  • Ubuntu 22.04 or 24.04 ISO
  • SSH access to your Proxmox host
  • Basic Linux command line knowledge

1. Enable IOMMU on Proxmox Host {#enable-iommu}

SSH into your Proxmox host and enable IOMMU for GPU passthrough.

For Intel CPUs:

# Edit GRUB configuration
nano /etc/default/grub

# Find the line starting with GRUB_CMDLINE_LINUX_DEFAULT
# Change it to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

# Save and exit (Ctrl+X, Y, Enter)

# Update GRUB
update-grub

# Reboot Proxmox host
reboot

For AMD CPUs:

# Edit GRUB configuration
nano /etc/default/grub

# Find the line starting with GRUB_CMDLINE_LINUX_DEFAULT
# Change it to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

# Save and exit (Ctrl+X, Y, Enter)

# Update GRUB
update-grub

# Reboot Proxmox host
reboot

Verify IOMMU is enabled:

dmesg | grep -e DMAR -e IOMMU

You should see output indicating IOMMU is enabled.


2. Configure GPU Passthrough {#configure-gpu-passthrough}

Load required kernel modules:

# Edit modules file
nano /etc/modules

# Add these lines:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

# Save and exit

Blacklist GPU drivers on host:

# Create blacklist file
nano /etc/modprobe.d/blacklist.conf

# Add these lines (for NVIDIA):
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
blacklist radeon
blacklist amdgpu

# Save and exit

Find your GPU’s PCI ID:

lspci -nn | grep -i nvidia

Output example:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [GeForce RTX 3060] [10de:2503]
01:00.1 Audio device [0403]: NVIDIA Corporation GA106 High Definition Audio Controller [10de:228e]

Note the IDs: 10de:2503 and 10de:228e

Bind GPU to VFIO:

# Edit VFIO configuration
nano /etc/modprobe.d/vfio.conf

# Add your GPU IDs (replace with your IDs from above):
options vfio-pci ids=10de:2503,10de:228e disable_vga=1

# Save and exit

Update initramfs and reboot:

update-initramfs -u -k all
reboot

Verify VFIO is in use:

lspci -k | grep -A 3 -i "VGA"

You should see Kernel driver in use: vfio-pci


3. Create Ubuntu VM {#create-ubuntu-vm}

Using Proxmox Web GUI:

  1. Upload Ubuntu ISO to Proxmox storage
  2. Click “Create VM”
  3. General:
    • VM ID: 100 (or your choice)
    • Name: ollama-server
  4. OS:
    • Select Ubuntu ISO
  5. System:
    • Machine: q35
    • BIOS: OVMF (UEFI)
    • Add EFI Disk
  6. Disks:
    • 50GB or more
  7. CPU:
    • 4+ cores recommended
    • Type: host
  8. Memory:
    • 16GB minimum (16384 MB)
    • 32GB recommended (32768 MB)
  9. Network:
    • Default bridge is fine

Add GPU to VM (Command Line):

# Replace 100 with your VM ID and adjust PCI address
qm set 100 -hostpci0 01:00,pcie=1,x-vga=1

Start VM and Install Ubuntu:

Start the VM and complete a standard Ubuntu installation.


4. Install NVIDIA Drivers in VM {#install-nvidia-drivers}

SSH or console into your Ubuntu VM:

# Update system
sudo apt update
sudo apt upgrade -y

# Add NVIDIA driver repository
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt update

# Install recommended NVIDIA driver
sudo ubuntu-drivers autoinstall

# Or install specific version (e.g., 570):
# sudo apt install nvidia-driver-570 -y

# Reboot VM
sudo reboot

Verify NVIDIA driver installation:

nvidia-smi

You should see your GPU information displayed.


5. Install Docker {#install-docker}

# Remove old Docker versions if any
sudo apt remove docker docker-engine docker.io containerd runc

# Install prerequisites
sudo apt update
sudo apt install -y \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

# Add Docker's official GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

# Set up Docker repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Add your user to docker group (replace 'username' with your username)
sudo usermod -aG docker $USER

# Log out and back in for group changes to take effect
# Or run:
newgrp docker

# Verify Docker installation
docker --version
docker run hello-world

6. Install NVIDIA Container Toolkit {#install-nvidia-container-toolkit}

This allows Docker containers to access your GPU.

# Add NVIDIA Container Toolkit repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# Update package list
sudo apt update

# Install NVIDIA Container Toolkit
sudo apt install -y nvidia-container-toolkit

# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Restart Docker
sudo systemctl restart docker

Test GPU access in Docker:

docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi

You should see nvidia-smi output from inside the container. If this works, GPU passthrough is successful!


7. Deploy Ollama {#deploy-ollama}

# Create volume for Ollama data
docker volume create ollama

# Run Ollama container with GPU support
docker run -d \
  --name ollama \
  --restart always \
  -p 11434:11434 \
  --gpus all \
  -v ollama:/root/.ollama \
  ollama/ollama

# Verify Ollama is running
docker ps

# Check Ollama logs
docker logs ollama

# Test Ollama API
curl http://localhost:11434/api/tags

You should see an empty models list (we’ll add models next).


8. Deploy Open WebUI {#deploy-open-webui}

# Create volume for Open WebUI data
docker volume create open-webui

# Run Open WebUI container
docker run -d \
  --name open-webui \
  --restart always \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

# Verify Open WebUI is running
docker ps

# Check Open WebUI logs
docker logs open-webui

Access Open WebUI:

Open your browser and navigate to:

http://YOUR_VM_IP:3000

Replace YOUR_VM_IP with your VM’s IP address (e.g., http://192.168.1.100:3000)

First-time setup:

  1. You’ll be prompted to create an admin account
  2. Choose a username and password
  3. This is stored locally—no external authentication

9. Download and Test Models {#download-models}

Method 1: Command Line

# Download Llama 3.2 (3B parameters - fast and efficient)
docker exec -it ollama ollama pull llama3.2

# Download Llama 3.2 8B (larger, more capable)
docker exec -it ollama ollama pull llama3.2:8b

# Download Qwen 2.5 Coder (excellent for programming)
docker exec -it ollama ollama pull qwen2.5-coder:7b

# Download Mistral (fast general-purpose model)
docker exec -it ollama ollama pull mistral

# Download LLaVA (vision model - can analyze images)
docker exec -it ollama ollama pull llava:7b

Method 2: Through Open WebUI

  1. Open Open WebUI in your browser
  2. Click the model dropdown at the top
  3. Type the model name (e.g., llama3.2)
  4. Click “Pull model” or select it to auto-download

List downloaded models:

docker exec -it ollama ollama list

Test a model from command line:

# Start interactive chat
docker exec -it ollama ollama run llama3.2

# Type your question and press Enter
# Type /bye to exit

Monitor GPU usage while running:

# In another terminal
watch -n 1 nvidia-smi

You should see:

  • GPU utilization spike to 50-100%
  • VRAM usage increase (5-6GB for 7B models)
  • The ollama process listed

Recommended Models by Use Case

General Use (7-8B):

docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama pull mistral
docker exec -it ollama ollama pull gemma2:9b

Coding:

docker exec -it ollama ollama pull qwen2.5-coder:7b
docker exec -it ollama ollama pull codellama:7b
docker exec -it ollama ollama pull deepseek-coder-v2:16b

Creative Writing:

docker exec -it ollama ollama pull dolphin-mistral
docker exec -it ollama ollama pull neural-chat

Vision (Image Analysis):

docker exec -it ollama ollama pull llava:7b
docker exec -it ollama ollama pull llama3.2-vision

Small & Fast (3-4B):

docker exec -it ollama ollama pull llama3.2:3b
docker exec -it ollama ollama pull phi3:mini
docker exec -it ollama ollama pull gemma2:2b

Useful Commands

Managing Docker Containers:

# View running containers
docker ps

# View all containers (including stopped)
docker ps -a

# Stop containers
docker stop ollama open-webui

# Start containers
docker start ollama open-webui

# Restart containers
docker restart ollama open-webui

# View logs
docker logs ollama
docker logs open-webui

# Follow logs in real-time
docker logs -f ollama

Managing Ollama Models:

# List downloaded models
docker exec -it ollama ollama list

# Remove a model
docker exec -it ollama ollama rm llama3.2

# Show model information
docker exec -it ollama ollama show llama3.2

System Monitoring:

# Monitor GPU usage
nvidia-smi

# Watch GPU usage continuously
watch -n 1 nvidia-smi

# Check disk usage
df -h

# Check Docker disk usage
docker system df

Troubleshooting

GPU not detected in VM:

# Check if GPU is visible
lspci | grep -i nvidia

# Verify VFIO is loaded
lsmod | grep vfio

# Check dmesg for errors
dmesg | grep -i vfio

Docker can’t access GPU:

# Verify NVIDIA Container Toolkit is installed
nvidia-ctk --version

# Check Docker runtime configuration
docker info | grep -i runtime

# Test GPU access
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi

Open WebUI can’t connect to Ollama:

# Test Ollama API from VM
curl http://localhost:11434/api/tags

# Check if Ollama is running
docker ps | grep ollama

# Check Ollama logs for errors
docker logs ollama

Models running slowly:

# Verify GPU is being used
nvidia-smi

# Check if model fits in VRAM
# 7B models need ~5-6GB VRAM
# 13B models need ~8-10GB VRAM
# If VRAM is full, try smaller models

Performance Tips

  1. Use quantized models for better performance on limited VRAM (look for Q4, Q5 variants)
  2. Keep models loaded – once loaded, they stay in VRAM for fast responses
  3. Monitor VRAM usage – don’t exceed your GPU’s capacity
  4. Use SSD storage – faster model loading times
  5. Allocate enough system RAM – helps when models don’t fit entirely in VRAM

Security Considerations

Firewall Configuration:

# Install UFW if not already installed
sudo apt install ufw

# Allow SSH
sudo ufw allow 22/tcp

# Allow Open WebUI (only from local network)
sudo ufw allow from 192.168.0.0/16 to any port 3000

# Allow Ollama API (only if needed externally)
sudo ufw allow from 192.168.0.0/16 to any port 11434

# Enable firewall
sudo ufw enable

# Check status
sudo ufw status

Best Practices:

  • Keep Ollama and Open WebUI on your local network
  • Don’t expose ports to the internet without authentication
  • Regularly update Docker images: docker pull ollama/ollama && docker pull ghcr.io/open-webui/open-webui:main
  • Back up your volumes: docker run --rm -v ollama:/data -v $(pwd):/backup ubuntu tar czf /backup/ollama-backup.tar.gz /data

Conclusion

You now have a complete, self-hosted AI platform running on your Proxmox homelab! You can:

  • Run AI models entirely on your own hardware
  • Maintain complete privacy (no data sent to external APIs)
  • Experiment with different models without cost or rate limits
  • Learn how modern AI systems work under the hood

Next Steps:

  • Experiment with different models
  • Try vision models with image analysis
  • Fine-tune models on your own data
  • Set up multiple VMs for different use cases
  • Explore Ollama’s API for custom integrations

Enjoy your local AI setup!