A mid-size marketing agency was drowning in client emails. Three account managers spent half their day triaging requests, copying info into Salesforce, and writing status updates. They deployed OpenClaw to automate the entire loop β and the results were dramatic.
The Email Chaos
200+ daily client emails across 40 active accounts. Every email needed classification (urgent/normal/FYI), CRM entry, and appropriate routing. 6 hours of human time daily, zero value-add.
Architecture Overview
OpenClaw runs on a self-hosted Ubuntu 22.04 server (4-core, 8GB RAM) with IMAP access to the agency's Google Workspace email. It connects to Salesforce via REST API and uses Slack webhooks for internal routing. The local Llama-3-8B model (via Ollama) classifies email intent, extracts structured data, and generates draft replies.
βββββββββββββββ IMAP/30s ββββββββββββββββ
β Gmail / βββββββββββββββββ OpenClaw β
β Workspace βββββββββββββββββΊβ Node β
βββββββββββββββ Draft Reply ββββββββ¬ββββββββ
β
βββββββββββββββΌββββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
βSalesforceβ β Slack β β Notion β
β CRM β β Webhooks β β Log DB β
ββββββββββββ ββββββββββββ ββββββββββββOpenClaw Configuration
# IDENTITY.md for Email Triage Agent You are an email triage specialist for a marketing agency. Your role is to classify, extract, and route client emails. ## Classification Rules - URGENT: Client mentions deadline within 48h, budget issue, or complaint - ACTION: Client requests deliverable, meeting, or status update - FYI: Newsletter forwards, CC'd threads, auto-notifications - SPAM: Vendor pitches, unrelated marketing ## Data Extraction For every client email, extract: 1. Client company name (match against CRM) 2. Project name (match against active projects) 3. Mentioned deadlines or dates 4. Budget figures if any 5. Sentiment (positive / neutral / negative) ## Response Rules - URGENT β Draft reply within template, alert #urgent-inbox on Slack - ACTION β Create Salesforce task, draft reply, route to account owner - FYI β Tag, archive, update CRM activity log - Never auto-send replies. Always create drafts for human review.
# docker-compose.yml β Email Triage Stack
version: '3.8'
services:
openclaw:
image: openclaw/openclaw:latest
ports:
- "18789:18789"
environment:
- OPENCLAW_MODEL=ollama:llama3:8b
- OPENCLAW_GATEWAY_TOKEN=${GATEWAY_TOKEN}
- IMAP_HOST=imap.gmail.com
- IMAP_PORT=993
- IMAP_USER=inbox@agency.com
- IMAP_PASSWORD=${IMAP_APP_PASSWORD}
- IMAP_POLL_INTERVAL=30
- SALESFORCE_CLIENT_ID=${SF_CLIENT_ID}
- SALESFORCE_CLIENT_SECRET=${SF_CLIENT_SECRET}
- SALESFORCE_REFRESH_TOKEN=${SF_REFRESH_TOKEN}
- SLACK_WEBHOOK_URGENT=${SLACK_URGENT_WEBHOOK}
- SLACK_WEBHOOK_DAILY=${SLACK_DAILY_WEBHOOK}
volumes:
- ./identity:/app/identity
- ./data:/app/data
restart: unless-stopped
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
volumes:
ollama_data:The Complete Workflow
1. Email Ingestion
OpenClaw polls Gmail IMAP every 30 seconds. New emails are parsed: sender, subject, body, attachments (filenames only), CC list, and thread context.
// OpenClaw polls: INBOX (UNSEEN) every 30s // Parsed fields: from, subject, body, cc, date, attachments
2. Intent Classification
The LLM classifies each email into URGENT / ACTION / FYI / SPAM with a confidence score. Emails below 80% confidence are flagged for human review.
Classification: URGENT (confidence: 0.94) Reason: Client mentioned 'deadline Friday' + negative sentiment Client: Acme Corp β Project: Q2 Campaign Refresh
3. Structured Data Extraction
Key fields are extracted and validated against CRM records. Fuzzy matching handles variations in company names (e.g., 'Acme' vs 'Acme Corporation Inc.').
Extracted: client: Acme Corp (SF ID: 001xx000003DGbW) project: Q2 Campaign Refresh (SF Opp: 006xx000001abc) deadline: 2026-03-21 (Friday) budget_mention: $45,000 (unchanged) sentiment: negative (0.23)
4. CRM Auto-Update
Salesforce records are updated via REST API: activity log entry, opportunity stage update if needed, contact 'last contact date' refresh.
POST /services/data/v59.0/sobjects/Task
{
"Subject": "Email: Q2 deadline concern",
"WhoId": "003xx000002xyz",
"WhatId": "006xx000001abc",
"Status": "Open",
"Priority": "High"
}5. Slack Routing + Draft Reply
Urgent emails trigger an immediate Slack alert with context. A draft reply is generated in Gmail for the account manager to review, edit, and send.
Slack #urgent-inbox: π΄ URGENT from Sarah@AcmeCorp Re: Q2 Campaign β deadline concern β Draft reply ready in Gmail β SF task created (High Priority) β Account owner: @jessica pinged
Results After 90 Days
The transformation was measured across three key metrics over a 90-day period:
| Metric | Before | After | Change |
|---|---|---|---|
| Daily email admin time | 6 hours | 45 minutes | β 87.5% |
| CRM data accuracy | ~60% | 95%+ | β 58% |
| Average response time | 4.2 hours | 38 minutes | β 85% |
| Missed urgent emails/week | 3-5 | 0 | β 100% |
| Client satisfaction (NPS) | 34 | 67 | β 97% |
"We went from dreading Monday mornings to having our inbox pre-sorted with draft replies waiting. It's like having a junior account manager who never sleeps." β Agency Director
Cost Analysis
| Item | Monthly Cost | Notes |
|---|---|---|
| Server (Hetzner CAX31) | $15 | 4-core ARM, 8GB RAM |
| Ollama + Llama-3-8B | $0 | Self-hosted, no API fees |
| Salesforce API | $0 | Included in existing plan |
| Slack (free tier) | $0 | Webhook-based, no upgrade needed |
| Google Workspace | $0 | Existing subscription |
| Total | $15/mo | vs $2,400/mo for 3 junior staff hours |
Annual savings: ~$28,000 in staff time redirected to client-facing work. ROI: 15,500% in the first year.
Security & Privacy
β οΈ Never store email credentials in plaintext. Use Docker secrets or a .env file with restricted permissions (chmod 600).
Frequently Asked Questions
Q1. What happens if the classification is wrong?
Q2. Does it handle attachments?
Q3. Can it work with Outlook / Exchange?
Q4. What model size is recommended?
Q5. How long to set up?
Lessons Learned
Start with IDENTITY, not code
80% of the accuracy improvements came from refining the natural-language rules in IDENTITY.md, not from model selection or configuration changes.
Confidence thresholds matter
Setting classification confidence to 80% (not 90%) was the sweet spot. 90% flagged too many emails for review, defeating the purpose.
Thread context is essential
Single-email classification missed context from ongoing threads. Adding the last 3 messages in the thread to the LLM prompt improved accuracy from 78% to 94%.
Don't auto-send
The biggest lesson: never auto-send AI-generated replies. Always create drafts. The 38-minute response time is for human review + send, and clients trust the agency more because of it.