Featured image of post Finding the Perfect Partner for Your AI Agents - oh-my-opencode Model Selection Guide

Finding the Perfect Partner for Your AI Agents - oh-my-opencode Model Selection Guide

How do you choose the best Chinese LLMs for different agents in oh-my-opencode? Based on deep research into GLM-4.7, GLM-5.0, MiniMax-M2.5, DeepSeek-V3.2, and Kimi-K2.5, this article provides a battle-tested configuration strategy to help you achieve the optimal balance between performance and cost.

1. Why Assign Different Models to Different Agents?

AI Team

While tinkering with the oh-my-opencode plugin recently, I stumbled upon an interesting problem: the plugin developers recommend different LLM providers for different agents, but I have a collection of Chinese models at my disposal (GLM-4.7, GLM-5.0, MiniMax-M2.5, DeepSeek-V3.2, Kimi-K2.5). How should I allocate them to get the most out of each?

Just like in a team where some people excel at design, others at coding, and others at documentation—models should be assigned the same way: there’s no single best model, only the most suitable one for each task.

2. Analyzing the “Personalities” of Five Chinese LLMs

Code Analysis

Before assigning tasks, I needed to understand what each model excels at. I spent time digging through their benchmark data and discovered that each model has its own “superpower.”

2.1 GLM Series: Zhipu AI’s Twin Stars

GLM-4.7 (355B parameters, 32B active)

  • Strong mathematical reasoning (MATH 92%)
  • Solid coding capabilities (LiveCodeBench 84.9%)
  • Multimodal support
  • Moderate pricing ($0.60/$2.20)

GLM-5.0 (744B parameters, 40B active)

  • Doubled parameters, but math benchmark actually dropped (MATH 88%)
  • SOTA-level performance on Agent tasks
  • 56% lower hallucination rate than 4.7 (this matters!)
  • Most expensive ($1.00/$3.20)

Takeaway: GLM-5.0 seems purpose-built for complex tasks. While it may not solve math problems as quickly as 4.7, it’s more stable and reliable. Think of it as a “brain” rather than a “calculator.”

2.2 MiniMax-M2.5: The Bang-for-Buck Champion

Key Stats (~230B parameters, 10B active)

  • Highest SWE-bench Verified score (80.2%)
  • Blazing fast inference (Lightning mode: 100 tok/s)
  • Cheapest option ($0.30/$1.20)
  • Storage-friendly (quantizable to 96GB)

Takeaway: This is the legendary “fast and frugal” option. If you need extensive code reviews and quick modifications, you can’t go wrong with this one.

2.3 DeepSeek-V3.2: Math Genius + Penny Pincher

Key Stats (671B parameters, 37B active)

  • Highest AIME 2026 score (94.17%)
  • IMO/IOI gold medal level
  • Insanely cheap ($0.28/$0.42, 27x cheaper than GPT-4o)
  • Text-only mode

Takeaway: If you need deep reasoning and long-running autonomous work without burning through your budget, this is your best bet.

2.4 Kimi-K2.5: The Multimodal All-Rounder

Key Stats (1T parameters, 32B active)

  • Largest context window (256K)
  • Strongest multimodal capabilities (MMMU 78.5%, OCRBench 92.3%)
  • Video understanding support
  • Agent Swarm (up to 100 sub-agents)

Takeaway: When you need to process images, videos, or long documents, this is your heavyweight champion.

3. Understanding oh-my-opencode’s Agent Architecture

Before assigning models, I studied oh-my-opencode’s agent architecture and discovered two categories of agents:

3.1 Primary Agents (Follow UI Model Selection)

  • Sisyphus: The conductor, orchestrates tasks and delegates work
  • Hephaestus: Deep worker, executes tasks end-to-end
  • Atlas: Main model for UI interactions
  • Prometheus: Strategic planner

3.2 Subagent Agents (Independent Model Config)

  • Oracle: Complex debugging, architecture consultant (EXPENSIVE)
  • Librarian: Documentation retrieval, external library queries (CHEAP)
  • Explore: Codebase search specialist (CHEAP)
  • Metis: Pre-planning consultant, identifies implicit intents (EXPENSIVE)
  • Momus: Plan reviewer (CHEAP)
  • Multimodal-looker: Image/video analysis (EXPENSIVE)

3.3 Categories (Auto-invoked by Task Type)

There are also 8 Categories automatically invoked based on task type:

  • visual-engineering: Frontend UI
  • ultrabrain: Complex logic
  • deep: Deep work
  • artistry: Creative tasks
  • quick: Quick modifications
  • unspecified-low/high: Simple/complex tasks
  • writing: Documentation generation

4. Configuration Strategy and Principles

Data Analytics

4.1 Principle 1: Right Tool for the Right Job

Multimodal Tasks → Kimi-K2.5

  • Why: MMMU 78.5%, MathVision 84.2%, OCRBench 92.3%
  • Best for: UI design, image analysis, video understanding

Code Analysis → MiniMax-M2.5

  • Why: SWE-bench Verified 80.2% (top score)
  • Best for: Code review, debugging, architecture analysis

Complex Reasoning → GLM-5.0

  • Why: Low hallucination rate (56% lower than 4.7), SOTA on Agent tasks
  • Best for: Complex planning, architecture design, orchestration

Cost Optimization → DeepSeek-V3.2

  • Why: Dirt cheap ($0.28/M), strong math capabilities
  • Best for: Documentation retrieval, long-running autonomous work

4.2 Principle 2: Reserve Premium Models for Critical Tasks

Use stronger models for EXPENSIVE-tier agents (Oracle, Metis, Multimodal-looker), and value-oriented models for CHEAP-tier agents (Librarian, Explore, Momus).

5. Final Configuration

5.1 Agents Configuration Table

Agent Selected Model Reason
hephaestus DeepSeek-V3.2 Deep autonomous work requiring long runtime—pick the cheapest
oracle MiniMax-M2.5 Highest SWE-bench score, strong code analysis
librarian DeepSeek-V3.2 Doc retrieval doesn’t need heavy lifting—go cheap
explore GLM-4.7 Code search needs balanced performance and cost
multimodal-looker Kimi-K2.5 Visual analysis requires the strongest multimodal model
prometheus GLM-5.0 Strategic planning needs low hallucination and strong reasoning
metis Kimi-K2.5 Intent analysis requires strong understanding and long context
momus MiniMax-M2.5 Plan review needs speed and accuracy
atlas Kimi-K2.5 UI interaction needs multimodal support

5.2 Categories Configuration Table

Category Selected Model Reason
visual-engineering Kimi-K2.5 Frontend UI design needs multimodal capabilities
ultrabrain GLM-5.0 Complex logic needs strongest reasoning with low hallucination
deep DeepSeek-V3.2 Deep work needs long runtime—optimize for cost
artistry Kimi-K2.5 Creative tasks need multimodal and Agent Swarm
quick MiniMax-M2.5 Quick modifications need fast response and low cost
unspecified-low MiniMax-M2.5 Simple tasks use best value option
unspecified-high GLM-5.0 Complex tasks use strongest reasoning
writing Kimi-K2.5 Long documents need 256K context

6. Configuration Code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
{
  "$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/dev/assets/oh-my-opencode.schema.json",
  "agents": {
    "hephaestus": {
      "model": "volcengine-coding/deepseek-v3.2"
    },
    "oracle": {
      "model": "volcengine-coding/minimax-m2.5"
    },
    "librarian": {
      "model": "volcengine-coding/deepseek-v3.2"
    },
    "explore": {
      "model": "opencode/go-glm-4.7"
    },
    "multimodal-looker": {
      "model": "volcengine-coding/kimi-k2.5"
    },
    "prometheus": {
      "model": "opencode/go-glm-5"
    },
    "metis": {
      "model": "volcengine-coding/kimi-k2.5"
    },
    "momus": {
      "model": "volcengine-coding/minimax-m2.5"
    },
    "atlas": {
      "model": "volcengine-coding/kimi-k2.5"
    }
  },
  "categories": {
    "visual-engineering": {
      "model": "volcengine-coding/kimi-k2.5"
    },
    "ultrabrain": {
      "model": "opencode/go-glm-5"
    },
    "deep": {
      "model": "volcengine-coding/deepseek-v3.2"
    },
    "artistry": {
      "model": "volcengine-coding/kimi-k2.5"
    },
    "quick": {
      "model": "volcengine-coding/minimax-m2.5"
    },
    "unspecified-low": {
      "model": "volcengine-coding/minimax-m2.5"
    },
    "unspecified-high": {
      "model": "opencode/go-glm-5"
    },
    "writing": {
      "model": "volcengine-coding/kimi-k2.5"
    }
  }
}

7. Cost Optimization Results

Cost Optimization

Compared to using opencode/glm-4.7-free ($0.60/M) for everything:

Task Type Original Cost New Cost Savings
Doc Retrieval (Librarian) $0.60/M $0.28/M 53%
Quick Edits (Quick) $0.60/M $0.30/M 50%
Deep Work (Deep) $0.60/M $0.28/M 53%
Code Review (Momus) $0.60/M $0.30/M 50%

8. Lessons Learned and Tips

8.1 Don’t Blindly Chase “Newest and Strongest”

GLM-5.0 has twice the parameters of 4.7, but its math benchmark actually dropped. Parameter count isn’t everything—focus on your specific task requirements.

8.2 Multimodal Capabilities Really Matter

I initially underestimated the importance of multimodal features. When you need to analyze UI screenshots, process charts, or understand code flow diagrams, Kimi-K2.5 noticeably outperforms the rest.

8.3 Optimize Cost-Sensitive Tasks Separately

CHEAP-tier agents like Librarian and Explore are called frequently but don’t need heavy lifting. Switching to DeepSeek-V3.2 significantly reduced overall costs.

8.4 Reserve EXPENSIVE Agents for Critical Scenarios

Don’t cheap out on EXPENSIVE-tier agents like Oracle and Metis. They handle complex tasks that require strong reasoning capabilities.

8.5 Testing Beats Theory

After configuration, I recommend testing these typical scenarios:

  • Code search (triggers Explore)
  • Documentation retrieval (triggers Librarian)
  • Visual analysis (triggers Multimodal-looker)
  • Complex architecture design (triggers Oracle or Ultrabrain)

9. Conclusion

The biggest takeaway from this configuration exercise: there’s no best model, only the most suitable one.

  • Kimi-K2.5: Top choice for multimodal scenarios—visual analysis, long document processing
  • MiniMax-M2.5: Code review and quick edits powerhouse with unbeatable value
  • GLM-5.0: The “brain” for complex planning and orchestration—low hallucination matters
  • DeepSeek-V3.2: Budget-friendly expert for deep work and documentation retrieval
  • GLM-4.7: Balanced performer for medium-complexity tasks

After applying this configuration, the entire system feels noticeably more efficient. Each agent is doing what it excels at, and costs are more reasonable too.

If you’re also using oh-my-opencode, I’d suggest tweaking the configuration based on your own use cases. After all, finding the right partner for the job is what doubles your productivity.


Note: This article is based on model data from March 2026. Benchmarks and pricing may change—please refer to the latest data.