$cd ../integrations/
π€ AI ModelsRecommendedv1.0+100% local Β· free forever
$ cat ollama-integration.md
openclaw.useModel('ollama', { local: true })
/** OpenClaw + Ollama: The Ultimate Local AI Stack. Your data never leaves your machine. */
// Why Ollama is the default recommendation
π
100% Private & Sovereign
// No API calls. No token tracking. No data leaving your machine. Perfect for processing sensitive logs or personal documents.
πΈ
Free Forever
// Download 100+ open-weight models at zero cost. No rate limits, no subscriptions, no surprise bills.
β‘
Low Latency inference
// Mac M-series: 20-45 tok/s. Fast enough for real-time Telegram/WhatsApp chat and automated workflows.
hardware_requirements.md
π» Minimum Hardware Requirements
>
8GB RAM / VRAM
Runs 3B - 7B models (llama3.2:3b, qwen2.5:7b). Great for basic chat.
>
16GB RAM / VRAM
Runs 8B - 14B models (llama3.1:8b, gemma3:9b). The sweet spot for general purpose.
>
32GB+ RAM / VRAM
Runs 32B+ models (qwen2.5:32b). Suitable for complex coding and reasoning.
step_01_install_ollama.sh
Step 1: Install Ollama & Pull a Model
# macOS
$ brew install ollama
# Linux (one-liner)
$ curl -fsSL https://ollama.ai/install.sh | sh
# Windows: download installer from ollama.ai
# Pull your first model (recommended for OpenClaw)
$ ollama pull llama3.1:8b
# pulling manifest... done β 4.7 GB
$ ollama serve # starts on http://localhost:11434
model_comparison.md
π Recommended Models for OpenClaw
// Tested on Mac Mini M4 (16GB UM) and Hetzner CPX21 (Ubuntu)
| Model | Size | Speed | Best For |
|---|---|---|---|
| llama3.2:3b | 2.0 GB | ~45 tok/s | Fast replies, chat |
| llama3.1:8bβ recommended | 4.7 GB | ~28 tok/s | General purpose β |
| llama3.1:70b | 40.0 GB | ~8 tok/s | Complex reasoning |
| gemma3:9b | 5.4 GB | ~25 tok/s | Code + structured output |
| mistral:7b | 4.1 GB | ~30 tok/s | EU data sovereignty |
| qwen2.5:14b | 8.9 GB | ~18 tok/s | Best for Chinese text |
step_02_config.yaml
Step 2: Connect OpenClaw to Ollama
# openclaw/config.yaml
ai:
provider: "ollama"
base_url: "http://localhost:11434"
model: "llama3.1:8b"
context_window: 8192
$ openclaw start
# β Connected to Ollama at localhost:11434
# β Model: llama3.1:8b (loaded, 4.7GB)
# β OpenClaw ready.
performance_tips.md
β‘ Advanced Performance Tuning
Keep Ollama model loaded in RAM
$ OLLAMA_KEEP_ALIVE=24h ollama serve
// Prevents model from being unloaded from memory between user requests. Eliminates the 5-10s cold-start delay.
Increase GPU layers (NVIDIA/AMD)
$ OLLAMA_GPU_LAYERS=33 ollama serve
// Forces more transformer layers onto the GPU for drastically faster inference.
Parallel requests for multi-user
$ OLLAMA_NUM_PARALLEL=4 ollama serve
// Allows 4 concurrent generations. Essential if you expose your OpenClaw bot to a group chat.
troubleshoot.log
π§ Common Issues & Fixes
Q: Error: Connection refused (localhost:11434)
A: Ollama is not running. Start it with `ollama serve` or ensure the background service is active.
Q: OpenClaw replies are extremely slow
A: The model might be spilling into CPU swap memory. Choose a smaller quantized model or increase your VRAM/RAM.
π Next Steps
β FAQ
Q1. Which models work best?
Llama 3 (8B) for general use, Mistral 7B for speed, and CodeLlama for programming tasks. All run well on 16GB RAM.
Q2. How much RAM do I need?
8GB minimum for 7B models, 16GB recommended for 13B models, 32GB+ for 70B models. GPU acceleration optional but helpful.
Q3. Can I use multiple models?
Yes. OpenClaw's model routing lets you assign different models to different tasks. Use a fast model for chat and a powerful one for analysis.