$cd ../tutorials/
recommendedadvanced25 min readMarch 2026
$ cat local-ai-setup.md

export LocalAISetup

/** Run OpenClaw with 100% local AI - Zero API costs, Complete Privacy */

ollama-terminal.png
Ollama terminal showing OpenClaw integration
section_01_intro.md

🏠## Why Run AI Locally?

Running AI models locally represents the ultimate form of sovereign intelligence. Instead of sending your data to cloud servers, everything stays on your own hardware. This approach offers significant advantages for privacy-conscious users and developers who want complete control over their AI assistant.

With the rise of powerful open-source models like Llama 3, Mistral, and Gemma, running local AI has never been more accessible. Combined with OpenClaw's flexible architecture, you can build a personal AI assistant that rivals cloud-based solutions while maintaining complete privacy.

// Benefits of local AI:
Zero API costs - No per-token charges, unlimited usage
Complete privacy - Your data never leaves your machine
Offline capable - Works without internet connection
Faster response - No network latency for local inference
Full customization - Fine-tune models for your use case
section_02_prerequisites.md

## 📦 Prerequisites

Before setting up local AI, ensure your system meets the minimum requirements. Local AI inference is computationally intensive, and having adequate hardware will significantly impact performance. The good news is that even a Mac Mini M2 with 16GB RAM can run impressive models.

macOS 12+ / Linux / Windows 11 (WSL2)
16GB+ RAM// 32GB+ recommended for larger models
Apple Silicon (M1/M2/M3) or NVIDIA GPU
20GB+ free storage// Models are 4-40GB each
OpenClaw installed// See getting-started guide

// 💡 Apple Silicon Macs are ideal for local AI - unified memory allows larger models

section_03_install.sh

## 🦙 Installing Ollama

Ollama is a powerful tool that makes running local LLMs incredibly simple. It handles model management, optimization, and provides a clean API that OpenClaw can connect to. The installation process takes just a few minutes and works seamlessly across all major platforms.

macOS / Linux Installation

$ curl -fsSL https://ollama.com/install.sh | sh

Verify Installation

$ ollama --version
ollama version 0.5.7
$ ollama serve
✓ Ollama is running on http://localhost:11434

Once installed, Ollama runs as a background service. It automatically manages model loading and unloading based on available memory, making it perfect for systems with limited resources. The service starts automatically on boot, so your AI assistant is always ready.

section_04_models.md

## 🎯 Choosing Your Model

Ollama model selection interface

Choosing the right model depends on your hardware and use case. Larger models offer better reasoning and knowledge, but require more memory and are slower. For most personal assistant tasks, a 7B or 13B parameter model provides an excellent balance of quality and speed.

ModelSizeRAMBest For
llama3.2:3b2GB8GBQuick tasks, low-resource systems
llama3.3:8b4.7GB16GBGeneral assistant, coding
mistral:7b4.1GB16GBFast responses, multilingual
codellama:13b7.4GB24GBProgramming, code review
llama3.3:70b40GB64GB+Maximum capability

Download Your Model

# Download the recommended model (4.7GB)
$ ollama pull llama3.3:8b
# Test it works
$ ollama run llama3.3:8b "Hello, introduce yourself"
section_05_integrate.sh

## 🔗 Integrating with OpenClaw

OpenClaw Ollama configuration interface

OpenClaw provides native integration with Ollama through the LiteLLM provider system. This allows you to seamlessly switch between local and cloud models, or even use them together. The configuration is straightforward and can be done through the onboarding wizard or manually.

Option 1: Interactive Setup

$ openclaw onboard
# When prompted for AI provider, select:
> Ollama (local)
# Enter Ollama endpoint:
> http://localhost:11434
# Select model:
> llama3.3:8b

Option 2: Manual Configuration

# ~/.openclaw/config.yaml
ai_provider:
type: "ollama"
endpoint: "http://localhost:11434"
model: "llama3.3:8b"
context_length: 8192

Test the Integration

$ openclaw agent --message "What model are you running on?"
🦞 I'm currently running on Llama 3.3 8B locally via Ollama...
section_06_claude_code.md

## ✨ One-Prompt Setup with Claude Code

Claude Code terminal setup

For users who prefer a guided approach, Claude Code (or similar AI coding assistants) can automate the entire setup process. Simply describe what you want, and the AI will handle installation, configuration, and testing. This method is ideal for beginners or those who want a quick setup.

The Magic Prompt

"Help me set up OpenClaw with local Ollama models. I want to connect it to WhatsApp and iMessage. Use Llama 3.3 as the default model. Configure security settings and set up automatic startup. Guide me step by step."

Claude Code will walk you through each step, automatically generating configuration files, testing the setup, and troubleshooting any issues. This approach combines the best of both worlds: the power of local AI with the convenience of guided setup.

// What Claude Code will do:
1. Check system requirements and prerequisites
2. Install Ollama and download your chosen model
3. Configure OpenClaw with optimal settings
4. Set up messaging channel bridges
5. Configure security and access controls
6. Create startup scripts for automatic launch
section_07_messaging.sh

## 💬 Configuring Messaging Channels

Once your local AI is running, connect it to your favorite messaging platforms. OpenClaw supports WhatsApp, Telegram, Discord, Slack, Signal, and iMessage. Each channel can be configured with different models or settings for specialized use cases.

WhatsApp Setup

$ openclaw channel add whatsapp
# Scan the QR code with your phone
✓ WhatsApp connected successfully

Telegram Setup

$ openclaw channel add telegram --token YOUR_BOT_TOKEN
✓ Telegram bot @YourOpenClawBot is now active

// 💡 Pro tip: Use different models for different channels. Code tasks can use CodeLlama while general chat uses Llama 3.3 for faster responses.

section_08_optimize.md

## ⚡ Performance Optimization

Getting the best performance from local AI requires some tuning. These optimizations can significantly reduce response latency and memory usage, especially on systems with limited resources.

Use Quantized Models
Q4_K_M quantization reduces model size by 75% with minimal quality loss. Use ollama pull llama3.3:8b-q4_K_M
Enable GPU Acceleration
Ollama automatically uses Apple Metal or CUDA. Verify with ollama info
Adjust Context Length
Lower context (4096 vs 8192) reduces memory and speeds up responses for simple tasks
Pre-load Models
Keep frequently used models warm with OLLAMA_KEEP_ALIVE=24h
section_09_troubleshoot.md

## 🔧 Troubleshooting

Error: Connection refused to localhost:11434
Ollama service is not running. Start it with: ollama serveor check launchctl list | grep ollama
Error: Out of memory
Model is too large for your RAM. Try a smaller model or quantized version: ollama pull llama3.2:3b
Slow response times
First response is slower due to model loading. Subsequent responses are faster. Use OLLAMA_KEEP_ALIVE=24h to keep models warm.
section_10_security.md

## 🔒 Security Best Practices

⚠️ Security Warning

A phishing site openclawd.ai (with a 'd') has been identified. Only use official sources: openclaw.ai or github.com/openclaw/openclaw

// Security checklist:
Bind Ollama to localhost only (default)
Use strong authentication for messaging channels
Enable rate limiting to prevent abuse
Regularly update OpenClaw and Ollama
Review permissions granted to the AI agent
next_steps.md

## 🎉 Congratulations!

You now have a fully local, private AI assistant running on your own hardware! Explore these tutorials to extend your setup:

Was this tutorial helpful?

// Join our community on Discord for more help

$ cd ../tutorials/* END_OF_TUTORIAL */