$cd ../integrations/
πŸ€– AI ModelsRecommendedv1.0+100% local Β· free forever
$ cat ollama-integration.md

openclaw.useModel('ollama', { local: true })

/** OpenClaw + Ollama: The Ultimate Local AI Stack. Your data never leaves your machine. */

// Why Ollama is the default recommendation
πŸ”’
100% Private & Sovereign
// No API calls. No token tracking. No data leaving your machine. Perfect for processing sensitive logs or personal documents.
πŸ’Έ
Free Forever
// Download 100+ open-weight models at zero cost. No rate limits, no subscriptions, no surprise bills.
⚑
Low Latency inference
// Mac M-series: 20-45 tok/s. Fast enough for real-time Telegram/WhatsApp chat and automated workflows.
hardware_requirements.md

πŸ’» Minimum Hardware Requirements

>
8GB RAM / VRAM
Runs 3B - 7B models (llama3.2:3b, qwen2.5:7b). Great for basic chat.
>
16GB RAM / VRAM
Runs 8B - 14B models (llama3.1:8b, gemma3:9b). The sweet spot for general purpose.
>
32GB+ RAM / VRAM
Runs 32B+ models (qwen2.5:32b). Suitable for complex coding and reasoning.
step_01_install_ollama.sh

Step 1: Install Ollama & Pull a Model

# macOS
$ brew install ollama
# Linux (one-liner)
$ curl -fsSL https://ollama.ai/install.sh | sh
# Windows: download installer from ollama.ai
# Pull your first model (recommended for OpenClaw)
$ ollama pull llama3.1:8b
# pulling manifest... done βœ“ 4.7 GB
$ ollama serve # starts on http://localhost:11434
model_comparison.md

πŸ“Š Recommended Models for OpenClaw

// Tested on Mac Mini M4 (16GB UM) and Hetzner CPX21 (Ubuntu)

ModelSizeSpeedBest For
llama3.2:3b2.0 GB~45 tok/sFast replies, chat
llama3.1:8b← recommended4.7 GB~28 tok/sGeneral purpose ⭐
llama3.1:70b40.0 GB~8 tok/sComplex reasoning
gemma3:9b5.4 GB~25 tok/sCode + structured output
mistral:7b4.1 GB~30 tok/sEU data sovereignty
qwen2.5:14b8.9 GB~18 tok/sBest for Chinese text
step_02_config.yaml

Step 2: Connect OpenClaw to Ollama

# openclaw/config.yaml
ai:
provider: "ollama"
base_url: "http://localhost:11434"
model: "llama3.1:8b"
context_window: 8192
$ openclaw start
# βœ“ Connected to Ollama at localhost:11434
# βœ“ Model: llama3.1:8b (loaded, 4.7GB)
# βœ“ OpenClaw ready.
performance_tips.md

⚑ Advanced Performance Tuning

Keep Ollama model loaded in RAM
$ OLLAMA_KEEP_ALIVE=24h ollama serve
// Prevents model from being unloaded from memory between user requests. Eliminates the 5-10s cold-start delay.
Increase GPU layers (NVIDIA/AMD)
$ OLLAMA_GPU_LAYERS=33 ollama serve
// Forces more transformer layers onto the GPU for drastically faster inference.
Parallel requests for multi-user
$ OLLAMA_NUM_PARALLEL=4 ollama serve
// Allows 4 concurrent generations. Essential if you expose your OpenClaw bot to a group chat.
troubleshoot.log

πŸ”§ Common Issues & Fixes

Q: Error: Connection refused (localhost:11434)
A: Ollama is not running. Start it with `ollama serve` or ensure the background service is active.
Q: OpenClaw replies are extremely slow
A: The model might be spilling into CPU swap memory. Choose a smaller quantized model or increase your VRAM/RAM.

❓ FAQ

Q1. Which models work best?

Llama 3 (8B) for general use, Mistral 7B for speed, and CodeLlama for programming tasks. All run well on 16GB RAM.

Q2. How much RAM do I need?

8GB minimum for 7B models, 16GB recommended for 13B models, 32GB+ for 70B models. GPU acceleration optional but helpful.

Q3. Can I use multiple models?

Yes. OpenClaw's model routing lets you assign different models to different tasks. Use a fast model for chat and a powerful one for analysis.
← Back to Integrations