This post is here mostly for me to remember the process on how to set up a complete local AI stack on Proxmox, from GPU passthrough to running my first models. Feel free to give it a try.
Prerequisites
Hardware Requirements:
- NVIDIA GPU (RTX 3060, 3070, 4060, etc.)
- CPU with virtualization support (Intel VT-d or AMD-Vi)
- Sufficient RAM (16GB minimum, 32GB recommended)
- Proxmox VE installed
What You’ll Need:
- Ubuntu 22.04 or 24.04 ISO
- SSH access to your Proxmox host
- Basic Linux command line knowledge
1. Enable IOMMU on Proxmox Host {#enable-iommu}
SSH into your Proxmox host and enable IOMMU for GPU passthrough.
For Intel CPUs:
# Edit GRUB configuration
nano /etc/default/grub
# Find the line starting with GRUB_CMDLINE_LINUX_DEFAULT
# Change it to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
# Save and exit (Ctrl+X, Y, Enter)
# Update GRUB
update-grub
# Reboot Proxmox host
reboot
For AMD CPUs:
# Edit GRUB configuration
nano /etc/default/grub
# Find the line starting with GRUB_CMDLINE_LINUX_DEFAULT
# Change it to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
# Save and exit (Ctrl+X, Y, Enter)
# Update GRUB
update-grub
# Reboot Proxmox host
reboot
Verify IOMMU is enabled:
dmesg | grep -e DMAR -e IOMMU
You should see output indicating IOMMU is enabled.
2. Configure GPU Passthrough {#configure-gpu-passthrough}
Load required kernel modules:
# Edit modules file
nano /etc/modules
# Add these lines:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
# Save and exit
Blacklist GPU drivers on host:
# Create blacklist file
nano /etc/modprobe.d/blacklist.conf
# Add these lines (for NVIDIA):
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
blacklist radeon
blacklist amdgpu
# Save and exit
Find your GPU’s PCI ID:
lspci -nn | grep -i nvidia
Output example:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [GeForce RTX 3060] [10de:2503]
01:00.1 Audio device [0403]: NVIDIA Corporation GA106 High Definition Audio Controller [10de:228e]
Note the IDs: 10de:2503 and 10de:228e
Bind GPU to VFIO:
# Edit VFIO configuration
nano /etc/modprobe.d/vfio.conf
# Add your GPU IDs (replace with your IDs from above):
options vfio-pci ids=10de:2503,10de:228e disable_vga=1
# Save and exit
Update initramfs and reboot:
update-initramfs -u -k all
reboot
Verify VFIO is in use:
lspci -k | grep -A 3 -i "VGA"
You should see Kernel driver in use: vfio-pci
3. Create Ubuntu VM {#create-ubuntu-vm}
Using Proxmox Web GUI:
- Upload Ubuntu ISO to Proxmox storage
- Click “Create VM”
- General:
- VM ID: 100 (or your choice)
- Name: ollama-server
- OS:
- Select Ubuntu ISO
- System:
- Machine: q35
- BIOS: OVMF (UEFI)
- Add EFI Disk
- Disks:
- 50GB or more
- CPU:
- 4+ cores recommended
- Type: host
- Memory:
- 16GB minimum (16384 MB)
- 32GB recommended (32768 MB)
- Network:
- Default bridge is fine
Add GPU to VM (Command Line):
# Replace 100 with your VM ID and adjust PCI address
qm set 100 -hostpci0 01:00,pcie=1,x-vga=1
Start VM and Install Ubuntu:
Start the VM and complete a standard Ubuntu installation.
4. Install NVIDIA Drivers in VM {#install-nvidia-drivers}
SSH or console into your Ubuntu VM:
# Update system
sudo apt update
sudo apt upgrade -y
# Add NVIDIA driver repository
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt update
# Install recommended NVIDIA driver
sudo ubuntu-drivers autoinstall
# Or install specific version (e.g., 570):
# sudo apt install nvidia-driver-570 -y
# Reboot VM
sudo reboot
Verify NVIDIA driver installation:
nvidia-smi
You should see your GPU information displayed.
5. Install Docker {#install-docker}
# Remove old Docker versions if any
sudo apt remove docker docker-engine docker.io containerd runc
# Install prerequisites
sudo apt update
sudo apt install -y \
ca-certificates \
curl \
gnupg \
lsb-release
# Add Docker's official GPG key
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
# Set up Docker repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker Engine
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Add your user to docker group (replace 'username' with your username)
sudo usermod -aG docker $USER
# Log out and back in for group changes to take effect
# Or run:
newgrp docker
# Verify Docker installation
docker --version
docker run hello-world
6. Install NVIDIA Container Toolkit {#install-nvidia-container-toolkit}
This allows Docker containers to access your GPU.
# Add NVIDIA Container Toolkit repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Update package list
sudo apt update
# Install NVIDIA Container Toolkit
sudo apt install -y nvidia-container-toolkit
# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
# Restart Docker
sudo systemctl restart docker
Test GPU access in Docker:
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
You should see nvidia-smi output from inside the container. If this works, GPU passthrough is successful!
7. Deploy Ollama {#deploy-ollama}
# Create volume for Ollama data
docker volume create ollama
# Run Ollama container with GPU support
docker run -d \
--name ollama \
--restart always \
-p 11434:11434 \
--gpus all \
-v ollama:/root/.ollama \
ollama/ollama
# Verify Ollama is running
docker ps
# Check Ollama logs
docker logs ollama
# Test Ollama API
curl http://localhost:11434/api/tags
You should see an empty models list (we’ll add models next).
8. Deploy Open WebUI {#deploy-open-webui}
# Create volume for Open WebUI data
docker volume create open-webui
# Run Open WebUI container
docker run -d \
--name open-webui \
--restart always \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
# Verify Open WebUI is running
docker ps
# Check Open WebUI logs
docker logs open-webui
Access Open WebUI:
Open your browser and navigate to:
http://YOUR_VM_IP:3000
Replace YOUR_VM_IP with your VM’s IP address (e.g., http://192.168.1.100:3000)
First-time setup:
- You’ll be prompted to create an admin account
- Choose a username and password
- This is stored locally—no external authentication
9. Download and Test Models {#download-models}
Method 1: Command Line
# Download Llama 3.2 (3B parameters - fast and efficient)
docker exec -it ollama ollama pull llama3.2
# Download Llama 3.2 8B (larger, more capable)
docker exec -it ollama ollama pull llama3.2:8b
# Download Qwen 2.5 Coder (excellent for programming)
docker exec -it ollama ollama pull qwen2.5-coder:7b
# Download Mistral (fast general-purpose model)
docker exec -it ollama ollama pull mistral
# Download LLaVA (vision model - can analyze images)
docker exec -it ollama ollama pull llava:7b
Method 2: Through Open WebUI
- Open Open WebUI in your browser
- Click the model dropdown at the top
- Type the model name (e.g.,
llama3.2) - Click “Pull model” or select it to auto-download
List downloaded models:
docker exec -it ollama ollama list
Test a model from command line:
# Start interactive chat
docker exec -it ollama ollama run llama3.2
# Type your question and press Enter
# Type /bye to exit
Monitor GPU usage while running:
# In another terminal
watch -n 1 nvidia-smi
You should see:
- GPU utilization spike to 50-100%
- VRAM usage increase (5-6GB for 7B models)
- The
ollamaprocess listed
Recommended Models by Use Case
General Use (7-8B):
docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama pull mistral
docker exec -it ollama ollama pull gemma2:9b
Coding:
docker exec -it ollama ollama pull qwen2.5-coder:7b
docker exec -it ollama ollama pull codellama:7b
docker exec -it ollama ollama pull deepseek-coder-v2:16b
Creative Writing:
docker exec -it ollama ollama pull dolphin-mistral
docker exec -it ollama ollama pull neural-chat
Vision (Image Analysis):
docker exec -it ollama ollama pull llava:7b
docker exec -it ollama ollama pull llama3.2-vision
Small & Fast (3-4B):
docker exec -it ollama ollama pull llama3.2:3b
docker exec -it ollama ollama pull phi3:mini
docker exec -it ollama ollama pull gemma2:2b
Useful Commands
Managing Docker Containers:
# View running containers
docker ps
# View all containers (including stopped)
docker ps -a
# Stop containers
docker stop ollama open-webui
# Start containers
docker start ollama open-webui
# Restart containers
docker restart ollama open-webui
# View logs
docker logs ollama
docker logs open-webui
# Follow logs in real-time
docker logs -f ollama
Managing Ollama Models:
# List downloaded models
docker exec -it ollama ollama list
# Remove a model
docker exec -it ollama ollama rm llama3.2
# Show model information
docker exec -it ollama ollama show llama3.2
System Monitoring:
# Monitor GPU usage
nvidia-smi
# Watch GPU usage continuously
watch -n 1 nvidia-smi
# Check disk usage
df -h
# Check Docker disk usage
docker system df
Troubleshooting
GPU not detected in VM:
# Check if GPU is visible
lspci | grep -i nvidia
# Verify VFIO is loaded
lsmod | grep vfio
# Check dmesg for errors
dmesg | grep -i vfio
Docker can’t access GPU:
# Verify NVIDIA Container Toolkit is installed
nvidia-ctk --version
# Check Docker runtime configuration
docker info | grep -i runtime
# Test GPU access
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
Open WebUI can’t connect to Ollama:
# Test Ollama API from VM
curl http://localhost:11434/api/tags
# Check if Ollama is running
docker ps | grep ollama
# Check Ollama logs for errors
docker logs ollama
Models running slowly:
# Verify GPU is being used
nvidia-smi
# Check if model fits in VRAM
# 7B models need ~5-6GB VRAM
# 13B models need ~8-10GB VRAM
# If VRAM is full, try smaller models
Performance Tips
- Use quantized models for better performance on limited VRAM (look for Q4, Q5 variants)
- Keep models loaded – once loaded, they stay in VRAM for fast responses
- Monitor VRAM usage – don’t exceed your GPU’s capacity
- Use SSD storage – faster model loading times
- Allocate enough system RAM – helps when models don’t fit entirely in VRAM
Security Considerations
Firewall Configuration:
# Install UFW if not already installed
sudo apt install ufw
# Allow SSH
sudo ufw allow 22/tcp
# Allow Open WebUI (only from local network)
sudo ufw allow from 192.168.0.0/16 to any port 3000
# Allow Ollama API (only if needed externally)
sudo ufw allow from 192.168.0.0/16 to any port 11434
# Enable firewall
sudo ufw enable
# Check status
sudo ufw status
Best Practices:
- Keep Ollama and Open WebUI on your local network
- Don’t expose ports to the internet without authentication
- Regularly update Docker images:
docker pull ollama/ollama && docker pull ghcr.io/open-webui/open-webui:main - Back up your volumes:
docker run --rm -v ollama:/data -v $(pwd):/backup ubuntu tar czf /backup/ollama-backup.tar.gz /data
Conclusion
You now have a complete, self-hosted AI platform running on your Proxmox homelab! You can:
- Run AI models entirely on your own hardware
- Maintain complete privacy (no data sent to external APIs)
- Experiment with different models without cost or rate limits
- Learn how modern AI systems work under the hood
Next Steps:
- Experiment with different models
- Try vision models with image analysis
- Fine-tune models on your own data
- Set up multiple VMs for different use cases
- Explore Ollama’s API for custom integrations
Enjoy your local AI setup!