Finding the Perfect Partner for Your AI Agents - oh-my-opencode Model Selection Guide

1. Why Assign Different Models to Different Agents?

AI Team

While tinkering with the oh-my-opencode plugin recently, I stumbled upon an interesting problem: the plugin developers recommend different LLM providers for different agents, but I have a collection of Chinese models at my disposal (GLM-4.7, GLM-5.0, MiniMax-M2.5, DeepSeek-V3.2, Kimi-K2.5). How should I allocate them to get the most out of each?

Just like in a team where some people excel at design, others at coding, and others at documentation—models should be assigned the same way: there’s no single best model, only the most suitable one for each task.

2. Analyzing the “Personalities” of Five Chinese LLMs

Code Analysis

Before assigning tasks, I needed to understand what each model excels at. I spent time digging through their benchmark data and discovered that each model has its own “superpower.”

2.1 GLM Series: Zhipu AI’s Twin Stars

GLM-4.7 (355B parameters, 32B active)

Strong mathematical reasoning (MATH 92%)
Solid coding capabilities (LiveCodeBench 84.9%)
Multimodal support
Moderate pricing ($0.60/$2.20)

GLM-5.0 (744B parameters, 40B active)

Doubled parameters, but math benchmark actually dropped (MATH 88%)
SOTA-level performance on Agent tasks
56% lower hallucination rate than 4.7 (this matters!)
Most expensive ($1.00/$3.20)

Takeaway: GLM-5.0 seems purpose-built for complex tasks. While it may not solve math problems as quickly as 4.7, it’s more stable and reliable. Think of it as a “brain” rather than a “calculator.”

2.2 MiniMax-M2.5: The Bang-for-Buck Champion

Key Stats (~230B parameters, 10B active)

Highest SWE-bench Verified score (80.2%)
Blazing fast inference (Lightning mode: 100 tok/s)
Cheapest option ($0.30/$1.20)
Storage-friendly (quantizable to 96GB)

Takeaway: This is the legendary “fast and frugal” option. If you need extensive code reviews and quick modifications, you can’t go wrong with this one.

2.3 DeepSeek-V3.2: Math Genius + Penny Pincher

Key Stats (671B parameters, 37B active)

Highest AIME 2026 score (94.17%)
IMO/IOI gold medal level
Insanely cheap ($0.28/$0.42, 27x cheaper than GPT-4o)
Text-only mode

Takeaway: If you need deep reasoning and long-running autonomous work without burning through your budget, this is your best bet.

2.4 Kimi-K2.5: The Multimodal All-Rounder

Key Stats (1T parameters, 32B active)

Largest context window (256K)
Strongest multimodal capabilities (MMMU 78.5%, OCRBench 92.3%)
Video understanding support
Agent Swarm (up to 100 sub-agents)

Takeaway: When you need to process images, videos, or long documents, this is your heavyweight champion.

3. Understanding oh-my-opencode’s Agent Architecture

Before assigning models, I studied oh-my-opencode’s agent architecture and discovered two categories of agents:

3.1 Primary Agents (Follow UI Model Selection)

Sisyphus: The conductor, orchestrates tasks and delegates work
Hephaestus: Deep worker, executes tasks end-to-end
Atlas: Main model for UI interactions
Prometheus: Strategic planner

3.2 Subagent Agents (Independent Model Config)

Oracle: Complex debugging, architecture consultant (EXPENSIVE)
Librarian: Documentation retrieval, external library queries (CHEAP)
Explore: Codebase search specialist (CHEAP)
Metis: Pre-planning consultant, identifies implicit intents (EXPENSIVE)
Momus: Plan reviewer (CHEAP)
Multimodal-looker: Image/video analysis (EXPENSIVE)

3.3 Categories (Auto-invoked by Task Type)

There are also 8 Categories automatically invoked based on task type:

visual-engineering: Frontend UI
ultrabrain: Complex logic
deep: Deep work
artistry: Creative tasks
quick: Quick modifications
unspecified-low/high: Simple/complex tasks
writing: Documentation generation

4. Configuration Strategy and Principles

Data Analytics

4.1 Principle 1: Right Tool for the Right Job

Multimodal Tasks → Kimi-K2.5

Why: MMMU 78.5%, MathVision 84.2%, OCRBench 92.3%
Best for: UI design, image analysis, video understanding

Code Analysis → MiniMax-M2.5

Why: SWE-bench Verified 80.2% (top score)
Best for: Code review, debugging, architecture analysis

Complex Reasoning → GLM-5.0

Why: Low hallucination rate (56% lower than 4.7), SOTA on Agent tasks
Best for: Complex planning, architecture design, orchestration

Cost Optimization → DeepSeek-V3.2

Why: Dirt cheap ($0.28/M), strong math capabilities
Best for: Documentation retrieval, long-running autonomous work

4.2 Principle 2: Reserve Premium Models for Critical Tasks

Use stronger models for EXPENSIVE-tier agents (Oracle, Metis, Multimodal-looker), and value-oriented models for CHEAP-tier agents (Librarian, Explore, Momus).

5. Final Configuration

5.1 Agents Configuration Table

Agent	Selected Model	Reason
hephaestus	DeepSeek-V3.2	Deep autonomous work requiring long runtime—pick the cheapest
oracle	MiniMax-M2.5	Highest SWE-bench score, strong code analysis
librarian	DeepSeek-V3.2	Doc retrieval doesn’t need heavy lifting—go cheap
explore	GLM-4.7	Code search needs balanced performance and cost
multimodal-looker	Kimi-K2.5	Visual analysis requires the strongest multimodal model
prometheus	GLM-5.0	Strategic planning needs low hallucination and strong reasoning
metis	Kimi-K2.5	Intent analysis requires strong understanding and long context
momus	MiniMax-M2.5	Plan review needs speed and accuracy
atlas	Kimi-K2.5	UI interaction needs multimodal support

5.2 Categories Configuration Table

Category	Selected Model	Reason
visual-engineering	Kimi-K2.5	Frontend UI design needs multimodal capabilities
ultrabrain	GLM-5.0	Complex logic needs strongest reasoning with low hallucination
deep	DeepSeek-V3.2	Deep work needs long runtime—optimize for cost
artistry	Kimi-K2.5	Creative tasks need multimodal and Agent Swarm
quick	MiniMax-M2.5	Quick modifications need fast response and low cost
unspecified-low	MiniMax-M2.5	Simple tasks use best value option
unspecified-high	GLM-5.0	Complex tasks use strongest reasoning
writing	Kimi-K2.5	Long documents need 256K context

6. Configuration Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58


{
  "$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/dev/assets/oh-my-opencode.schema.json",
  "agents": {
    "hephaestus": {
      "model": "volcengine-coding/deepseek-v3.2"
    },
    "oracle": {
      "model": "volcengine-coding/minimax-m2.5"
    },
    "librarian": {
      "model": "volcengine-coding/deepseek-v3.2"
    },
    "explore": {
      "model": "opencode/go-glm-4.7"
    },
    "multimodal-looker": {
      "model": "volcengine-coding/kimi-k2.5"
    },
    "prometheus": {
      "model": "opencode/go-glm-5"
    },
    "metis": {
      "model": "volcengine-coding/kimi-k2.5"
    },
    "momus": {
      "model": "volcengine-coding/minimax-m2.5"
    },
    "atlas": {
      "model": "volcengine-coding/kimi-k2.5"
    }
  },
  "categories": {
    "visual-engineering": {
      "model": "volcengine-coding/kimi-k2.5"
    },
    "ultrabrain": {
      "model": "opencode/go-glm-5"
    },
    "deep": {
      "model": "volcengine-coding/deepseek-v3.2"
    },
    "artistry": {
      "model": "volcengine-coding/kimi-k2.5"
    },
    "quick": {
      "model": "volcengine-coding/minimax-m2.5"
    },
    "unspecified-low": {
      "model": "volcengine-coding/minimax-m2.5"
    },
    "unspecified-high": {
      "model": "opencode/go-glm-5"
    },
    "writing": {
      "model": "volcengine-coding/kimi-k2.5"
    }
  }
}

7. Cost Optimization Results

Cost Optimization

Compared to using opencode/glm-4.7-free ($0.60/M) for everything:

Task Type	Original Cost	New Cost	Savings
Doc Retrieval (Librarian)	$0.60/M	$0.28/M	53%
Quick Edits (Quick)	$0.60/M	$0.30/M	50%
Deep Work (Deep)	$0.60/M	$0.28/M	53%
Code Review (Momus)	$0.60/M	$0.30/M	50%

8. Lessons Learned and Tips

8.1 Don’t Blindly Chase “Newest and Strongest”

GLM-5.0 has twice the parameters of 4.7, but its math benchmark actually dropped. Parameter count isn’t everything—focus on your specific task requirements.

8.2 Multimodal Capabilities Really Matter

I initially underestimated the importance of multimodal features. When you need to analyze UI screenshots, process charts, or understand code flow diagrams, Kimi-K2.5 noticeably outperforms the rest.

8.3 Optimize Cost-Sensitive Tasks Separately

CHEAP-tier agents like Librarian and Explore are called frequently but don’t need heavy lifting. Switching to DeepSeek-V3.2 significantly reduced overall costs.

8.4 Reserve EXPENSIVE Agents for Critical Scenarios

Don’t cheap out on EXPENSIVE-tier agents like Oracle and Metis. They handle complex tasks that require strong reasoning capabilities.

8.5 Testing Beats Theory

After configuration, I recommend testing these typical scenarios:

Code search (triggers Explore)
Documentation retrieval (triggers Librarian)
Visual analysis (triggers Multimodal-looker)
Complex architecture design (triggers Oracle or Ultrabrain)

9. Conclusion

The biggest takeaway from this configuration exercise: there’s no best model, only the most suitable one.

Kimi-K2.5: Top choice for multimodal scenarios—visual analysis, long document processing
MiniMax-M2.5: Code review and quick edits powerhouse with unbeatable value
GLM-5.0: The “brain” for complex planning and orchestration—low hallucination matters
DeepSeek-V3.2: Budget-friendly expert for deep work and documentation retrieval
GLM-4.7: Balanced performer for medium-complexity tasks

After applying this configuration, the entire system feels noticeably more efficient. Each agent is doing what it excels at, and costs are more reasonable too.

If you’re also using oh-my-opencode, I’d suggest tweaking the configuration based on your own use cases. After all, finding the right partner for the job is what doubles your productivity.

Note: This article is based on model data from March 2026. Benchmarks and pricing may change—please refer to the latest data.