A mid-size marketing agency was drowning in client emails. Three account managers spent half their day triaging requests, copying info into Salesforce, and writing status updates. They deployed OpenClaw to automate the entire loop — and the results were dramatic.

The Email Chaos

200+ daily client emails across 40 active accounts. Every email needed classification (urgent/normal/FYI), CRM entry, and appropriate routing. 6 hours of human time daily, zero value-add.

200+

Daily Emails

Active Accounts

Hours Wasted/Day

~60%

CRM Accuracy

Architecture Overview

OpenClaw runs on a self-hosted Ubuntu 22.04 server (4-core, 8GB RAM) with IMAP access to the agency's Google Workspace email. It connects to Salesforce via REST API and uses Slack webhooks for internal routing. The local Llama-3-8B model (via Ollama) classifies email intent, extracts structured data, and generates draft replies.

┌─────────────┐    IMAP/30s    ┌──────────────┐
│  Gmail /    │◄──────────────│   OpenClaw    │
│  Workspace  │───────────────►│   Node       │
└─────────────┘    Draft Reply └──────┬───────┘
                                      │
                        ┌─────────────┼─────────────┐
                        ▼             ▼             ▼
                  ┌──────────┐ ┌──────────┐ ┌──────────┐
                  │Salesforce│ │  Slack   │ │  Notion  │
                  │  CRM     │ │ Webhooks │ │  Log DB  │
                  └──────────┘ └──────────┘ └──────────┘

OpenClaw Configuration

IDENTITY.md

# IDENTITY.md for Email Triage Agent

You are an email triage specialist for a marketing agency.
Your role is to classify, extract, and route client emails.

## Classification Rules
- URGENT: Client mentions deadline within 48h, budget issue, or complaint
- ACTION: Client requests deliverable, meeting, or status update  
- FYI: Newsletter forwards, CC'd threads, auto-notifications
- SPAM: Vendor pitches, unrelated marketing

## Data Extraction
For every client email, extract:
1. Client company name (match against CRM)
2. Project name (match against active projects)
3. Mentioned deadlines or dates
4. Budget figures if any
5. Sentiment (positive / neutral / negative)

## Response Rules
- URGENT → Draft reply within template, alert #urgent-inbox on Slack
- ACTION → Create Salesforce task, draft reply, route to account owner
- FYI → Tag, archive, update CRM activity log
- Never auto-send replies. Always create drafts for human review.

docker-compose.yml

# docker-compose.yml — Email Triage Stack
version: '3.8'
services:
  openclaw:
    image: openclaw/openclaw:latest
    ports:
      - "18789:18789"
    environment:
      - OPENCLAW_MODEL=ollama:llama3:8b
      - OPENCLAW_GATEWAY_TOKEN=${GATEWAY_TOKEN}
      - IMAP_HOST=imap.gmail.com
      - IMAP_PORT=993
      - IMAP_USER=inbox@agency.com
      - IMAP_PASSWORD=${IMAP_APP_PASSWORD}
      - IMAP_POLL_INTERVAL=30
      - SALESFORCE_CLIENT_ID=${SF_CLIENT_ID}
      - SALESFORCE_CLIENT_SECRET=${SF_CLIENT_SECRET}
      - SALESFORCE_REFRESH_TOKEN=${SF_REFRESH_TOKEN}
      - SLACK_WEBHOOK_URGENT=${SLACK_URGENT_WEBHOOK}
      - SLACK_WEBHOOK_DAILY=${SLACK_DAILY_WEBHOOK}
    volumes:
      - ./identity:/app/identity
      - ./data:/app/data
    restart: unless-stopped
    
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

volumes:
  ollama_data:

The Complete Workflow

1. Email Ingestion

OpenClaw polls Gmail IMAP every 30 seconds. New emails are parsed: sender, subject, body, attachments (filenames only), CC list, and thread context.

// OpenClaw polls: INBOX (UNSEEN) every 30s
// Parsed fields: from, subject, body, cc, date, attachments

2. Intent Classification

The LLM classifies each email into URGENT / ACTION / FYI / SPAM with a confidence score. Emails below 80% confidence are flagged for human review.

Classification: URGENT (confidence: 0.94)
Reason: Client mentioned 'deadline Friday' + negative sentiment
Client: Acme Corp → Project: Q2 Campaign Refresh

3. Structured Data Extraction

Key fields are extracted and validated against CRM records. Fuzzy matching handles variations in company names (e.g., 'Acme' vs 'Acme Corporation Inc.').

Extracted:
  client: Acme Corp (SF ID: 001xx000003DGbW)
  project: Q2 Campaign Refresh (SF Opp: 006xx000001abc)
  deadline: 2026-03-21 (Friday)
  budget_mention: $45,000 (unchanged)
  sentiment: negative (0.23)

4. CRM Auto-Update

Salesforce records are updated via REST API: activity log entry, opportunity stage update if needed, contact 'last contact date' refresh.

POST /services/data/v59.0/sobjects/Task
{
  "Subject": "Email: Q2 deadline concern",
  "WhoId": "003xx000002xyz",
  "WhatId": "006xx000001abc",
  "Status": "Open",
  "Priority": "High"
}

5. Slack Routing + Draft Reply

Urgent emails trigger an immediate Slack alert with context. A draft reply is generated in Gmail for the account manager to review, edit, and send.

Slack #urgent-inbox:
🔴 URGENT from Sarah@AcmeCorp
Re: Q2 Campaign — deadline concern
→ Draft reply ready in Gmail
→ SF task created (High Priority)
→ Account owner: @jessica pinged

Results After 90 Days

The transformation was measured across three key metrics over a 90-day period:

Metric	Before	After	Change
Daily email admin time	6 hours	45 minutes	↓ 87.5%
CRM data accuracy	~60%	95%+	↑ 58%
Average response time	4.2 hours	38 minutes	↓ 85%
Missed urgent emails/week	3-5	0	↓ 100%
Client satisfaction (NPS)	34	67	↑ 97%

"We went from dreading Monday mornings to having our inbox pre-sorted with draft replies waiting. It's like having a junior account manager who never sleeps." — Agency Director

Cost Analysis

Item	Monthly Cost	Notes
Server (Hetzner CAX31)	$15	4-core ARM, 8GB RAM
Ollama + Llama-3-8B	$0	Self-hosted, no API fees
Salesforce API	$0	Included in existing plan
Slack (free tier)	$0	Webhook-based, no upgrade needed
Google Workspace	$0	Existing subscription
Total	$15/mo	vs $2,400/mo for 3 junior staff hours

Annual savings: ~$28,000 in staff time redirected to client-facing work. ROI: 15,500% in the first year.

Security & Privacy

All email processing happens on-premises — no data leaves the server

Salesforce connection uses OAuth 2.0 with scoped refresh tokens

Gmail access via App Password with 2FA — not OAuth (agency's preference for auditability)

Email bodies are processed in memory and never stored to disk

Weekly audit log exported to compliance team's Notion database

⚠️ Never store email credentials in plaintext. Use Docker secrets or a .env file with restricted permissions (chmod 600).

Frequently Asked Questions

Q1. What happens if the classification is wrong?

Emails below 80% confidence are flagged for human review. Account managers can correct classifications via a Slack reaction (✅ correct / ❌ wrong), which feeds back into the model's context for future accuracy improvements.

Q2. Does it handle attachments?

Currently it logs attachment filenames and sizes but doesn't parse content. For PDF/image analysis, you'd add a preprocessing skill. The agency's next phase plans OCR for invoice attachments.

Q3. Can it work with Outlook / Exchange?

Yes — change IMAP settings to your Exchange server. OpenClaw's IMAP skill is provider-agnostic. For Exchange Online, use OAuth2 with Microsoft Graph API instead of basic IMAP.

Q4. What model size is recommended?

Llama-3-8B handles this workflow well. For agencies with 500+ daily emails, Mixtral-8x7B improves classification nuance but needs 32GB+ RAM. Test with 8B first.

Q5. How long to set up?

Initial deployment: ~2 hours. Fine-tuning IDENTITY rules for your specific client base: 1-2 weeks of iteration. Most improvements come from refining the classification criteria in IDENTITY.md.

Lessons Learned

Start with IDENTITY, not code

80% of the accuracy improvements came from refining the natural-language rules in IDENTITY.md, not from model selection or configuration changes.

Confidence thresholds matter

Setting classification confidence to 80% (not 90%) was the sweet spot. 90% flagged too many emails for review, defeating the purpose.

Thread context is essential

Single-email classification missed context from ongoing threads. Adding the last 3 messages in the thread to the LLM prompt improved accuracy from 78% to 94%.

Don't auto-send

The biggest lesson: never auto-send AI-generated replies. Always create drafts. The 38-minute response time is for human review + send, and clients trust the agency more because of it.

How a 50-Person Agency Eliminated 6 Hours of Daily Email Admin