🤖 AI ModelsRecommendedv1.0+100% local · free forever

$ cat ollama-integration.md

openclaw.useModel('ollama', { local: true })

/** OpenClaw + Ollama: The Ultimate Local AI Stack. Your data never leaves your machine. */

// Why Ollama is the default recommendation

🔒

100% Private & Sovereign

// No API calls. No token tracking. No data leaving your machine. Perfect for processing sensitive logs or personal documents.

💸

Free Forever

// Download 100+ open-weight models at zero cost. No rate limits, no subscriptions, no surprise bills.

⚡

Low Latency inference

// Mac M-series: 20-45 tok/s. Fast enough for real-time Telegram/WhatsApp chat and automated workflows.

hardware_requirements.md

💻 Minimum Hardware Requirements

8GB RAM / VRAM

Runs 3B - 7B models (llama3.2:3b, qwen2.5:7b). Great for basic chat.

16GB RAM / VRAM

Runs 8B - 14B models (llama3.1:8b, gemma3:9b). The sweet spot for general purpose.

32GB+ RAM / VRAM

Runs 32B+ models (qwen2.5:32b). Suitable for complex coding and reasoning.

step_01_install_ollama.sh

Step 1: Install Ollama & Pull a Model

# macOS

$ brew install ollama

# Linux (one-liner)

$ curl -fsSL https://ollama.ai/install.sh | sh

# Windows: download installer from ollama.ai

# Pull your first model (recommended for OpenClaw)

$ ollama pull llama3.1:8b

# pulling manifest... done ✓ 4.7 GB

$ ollama serve # starts on http://localhost:11434

model_comparison.md

📊 Recommended Models for OpenClaw

// Tested on Mac Mini M4 (16GB UM) and Hetzner CPX21 (Ubuntu)

Model	Size	Speed	Best For
llama3.2:3b	2.0 GB	~45 tok/s	Fast replies, chat
llama3.1:8b← recommended	4.7 GB	~28 tok/s	General purpose ⭐
llama3.1:70b	40.0 GB	~8 tok/s	Complex reasoning
gemma3:9b	5.4 GB	~25 tok/s	Code + structured output
mistral:7b	4.1 GB	~30 tok/s	EU data sovereignty
qwen2.5:14b	8.9 GB	~18 tok/s	Best for Chinese text

step_02_config.yaml

Step 2: Connect OpenClaw to Ollama

# openclaw/config.yaml

ai:

provider: "ollama"

base_url: "http://localhost:11434"

model: "llama3.1:8b"

context_window: 8192

$ openclaw start

# ✓ Connected to Ollama at localhost:11434

# ✓ Model: llama3.1:8b (loaded, 4.7GB)

# ✓ OpenClaw ready.

performance_tips.md

⚡ Advanced Performance Tuning

Keep Ollama model loaded in RAM

$ OLLAMA_KEEP_ALIVE=24h ollama serve

// Prevents model from being unloaded from memory between user requests. Eliminates the 5-10s cold-start delay.

Increase GPU layers (NVIDIA/AMD)

$ OLLAMA_GPU_LAYERS=33 ollama serve

// Forces more transformer layers onto the GPU for drastically faster inference.

Parallel requests for multi-user

$ OLLAMA_NUM_PARALLEL=4 ollama serve

// Allows 4 concurrent generations. Essential if you expose your OpenClaw bot to a group chat.

troubleshoot.log

🔧 Common Issues & Fixes

Q: Error: Connection refused (localhost:11434)

A: Ollama is not running. Start it with `ollama serve` or ensure the background service is active.

Q: OpenClaw replies are extremely slow

A: The model might be spilling into CPU swap memory. Choose a smaller quantized model or increase your VRAM/RAM.

🚀 Next Steps

→ Claude API

// Use cloud AI when local isn't enough

→ Local AI Setup Guide

// Full hardware + OS setup walkthrough

❓ FAQ

Q1. Which models work best?

Llama 3 (8B) for general use, Mistral 7B for speed, and CodeLlama for programming tasks. All run well on 16GB RAM.

Q2. How much RAM do I need?

8GB minimum for 7B models, 16GB recommended for 13B models, 32GB+ for 70B models. GPU acceleration optional but helpful.

Q3. Can I use multiple models?

Yes. OpenClaw's model routing lets you assign different models to different tasks. Use a fast model for chat and a powerful one for analysis.

← Back to Integrations