1. Why Assign Different Models to Different Agents?

While tinkering with the oh-my-opencode plugin recently, I stumbled upon an interesting problem: the plugin developers recommend different LLM providers for different agents, but I have a collection of Chinese models at my disposal (GLM-4.7, GLM-5.0, MiniMax-M2.5, DeepSeek-V3.2, Kimi-K2.5). How should I allocate them to get the most out of each?
Just like in a team where some people excel at design, others at coding, and others at documentation—models should be assigned the same way: there’s no single best model, only the most suitable one for each task.
2. Analyzing the “Personalities” of Five Chinese LLMs

Before assigning tasks, I needed to understand what each model excels at. I spent time digging through their benchmark data and discovered that each model has its own “superpower.”
2.1 GLM Series: Zhipu AI’s Twin Stars
GLM-4.7 (355B parameters, 32B active)
- Strong mathematical reasoning (MATH 92%)
- Solid coding capabilities (LiveCodeBench 84.9%)
- Multimodal support
- Moderate pricing ($0.60/$2.20)
GLM-5.0 (744B parameters, 40B active)
- Doubled parameters, but math benchmark actually dropped (MATH 88%)
- SOTA-level performance on Agent tasks
- 56% lower hallucination rate than 4.7 (this matters!)
- Most expensive ($1.00/$3.20)
Takeaway: GLM-5.0 seems purpose-built for complex tasks. While it may not solve math problems as quickly as 4.7, it’s more stable and reliable. Think of it as a “brain” rather than a “calculator.”
2.2 MiniMax-M2.5: The Bang-for-Buck Champion
Key Stats (~230B parameters, 10B active)
- Highest SWE-bench Verified score (80.2%)
- Blazing fast inference (Lightning mode: 100 tok/s)
- Cheapest option ($0.30/$1.20)
- Storage-friendly (quantizable to 96GB)
Takeaway: This is the legendary “fast and frugal” option. If you need extensive code reviews and quick modifications, you can’t go wrong with this one.
2.3 DeepSeek-V3.2: Math Genius + Penny Pincher
Key Stats (671B parameters, 37B active)
- Highest AIME 2026 score (94.17%)
- IMO/IOI gold medal level
- Insanely cheap ($0.28/$0.42, 27x cheaper than GPT-4o)
- Text-only mode
Takeaway: If you need deep reasoning and long-running autonomous work without burning through your budget, this is your best bet.
2.4 Kimi-K2.5: The Multimodal All-Rounder
Key Stats (1T parameters, 32B active)
- Largest context window (256K)
- Strongest multimodal capabilities (MMMU 78.5%, OCRBench 92.3%)
- Video understanding support
- Agent Swarm (up to 100 sub-agents)
Takeaway: When you need to process images, videos, or long documents, this is your heavyweight champion.
3. Understanding oh-my-opencode’s Agent Architecture
Before assigning models, I studied oh-my-opencode’s agent architecture and discovered two categories of agents:
3.1 Primary Agents (Follow UI Model Selection)
- Sisyphus: The conductor, orchestrates tasks and delegates work
- Hephaestus: Deep worker, executes tasks end-to-end
- Atlas: Main model for UI interactions
- Prometheus: Strategic planner
3.2 Subagent Agents (Independent Model Config)
- Oracle: Complex debugging, architecture consultant (EXPENSIVE)
- Librarian: Documentation retrieval, external library queries (CHEAP)
- Explore: Codebase search specialist (CHEAP)
- Metis: Pre-planning consultant, identifies implicit intents (EXPENSIVE)
- Momus: Plan reviewer (CHEAP)
- Multimodal-looker: Image/video analysis (EXPENSIVE)
3.3 Categories (Auto-invoked by Task Type)
There are also 8 Categories automatically invoked based on task type:
- visual-engineering: Frontend UI
- ultrabrain: Complex logic
- deep: Deep work
- artistry: Creative tasks
- quick: Quick modifications
- unspecified-low/high: Simple/complex tasks
- writing: Documentation generation
4. Configuration Strategy and Principles

4.1 Principle 1: Right Tool for the Right Job
Multimodal Tasks → Kimi-K2.5
- Why: MMMU 78.5%, MathVision 84.2%, OCRBench 92.3%
- Best for: UI design, image analysis, video understanding
Code Analysis → MiniMax-M2.5
- Why: SWE-bench Verified 80.2% (top score)
- Best for: Code review, debugging, architecture analysis
Complex Reasoning → GLM-5.0
- Why: Low hallucination rate (56% lower than 4.7), SOTA on Agent tasks
- Best for: Complex planning, architecture design, orchestration
Cost Optimization → DeepSeek-V3.2
- Why: Dirt cheap ($0.28/M), strong math capabilities
- Best for: Documentation retrieval, long-running autonomous work
4.2 Principle 2: Reserve Premium Models for Critical Tasks
Use stronger models for EXPENSIVE-tier agents (Oracle, Metis, Multimodal-looker), and value-oriented models for CHEAP-tier agents (Librarian, Explore, Momus).
5. Final Configuration
5.1 Agents Configuration Table
| Agent | Selected Model | Reason |
|---|---|---|
| hephaestus | DeepSeek-V3.2 | Deep autonomous work requiring long runtime—pick the cheapest |
| oracle | MiniMax-M2.5 | Highest SWE-bench score, strong code analysis |
| librarian | DeepSeek-V3.2 | Doc retrieval doesn’t need heavy lifting—go cheap |
| explore | GLM-4.7 | Code search needs balanced performance and cost |
| multimodal-looker | Kimi-K2.5 | Visual analysis requires the strongest multimodal model |
| prometheus | GLM-5.0 | Strategic planning needs low hallucination and strong reasoning |
| metis | Kimi-K2.5 | Intent analysis requires strong understanding and long context |
| momus | MiniMax-M2.5 | Plan review needs speed and accuracy |
| atlas | Kimi-K2.5 | UI interaction needs multimodal support |
5.2 Categories Configuration Table
| Category | Selected Model | Reason |
|---|---|---|
| visual-engineering | Kimi-K2.5 | Frontend UI design needs multimodal capabilities |
| ultrabrain | GLM-5.0 | Complex logic needs strongest reasoning with low hallucination |
| deep | DeepSeek-V3.2 | Deep work needs long runtime—optimize for cost |
| artistry | Kimi-K2.5 | Creative tasks need multimodal and Agent Swarm |
| quick | MiniMax-M2.5 | Quick modifications need fast response and low cost |
| unspecified-low | MiniMax-M2.5 | Simple tasks use best value option |
| unspecified-high | GLM-5.0 | Complex tasks use strongest reasoning |
| writing | Kimi-K2.5 | Long documents need 256K context |
6. Configuration Code
|
|
7. Cost Optimization Results

Compared to using opencode/glm-4.7-free ($0.60/M) for everything:
| Task Type | Original Cost | New Cost | Savings |
|---|---|---|---|
| Doc Retrieval (Librarian) | $0.60/M | $0.28/M | 53% |
| Quick Edits (Quick) | $0.60/M | $0.30/M | 50% |
| Deep Work (Deep) | $0.60/M | $0.28/M | 53% |
| Code Review (Momus) | $0.60/M | $0.30/M | 50% |
8. Lessons Learned and Tips
8.1 Don’t Blindly Chase “Newest and Strongest”
GLM-5.0 has twice the parameters of 4.7, but its math benchmark actually dropped. Parameter count isn’t everything—focus on your specific task requirements.
8.2 Multimodal Capabilities Really Matter
I initially underestimated the importance of multimodal features. When you need to analyze UI screenshots, process charts, or understand code flow diagrams, Kimi-K2.5 noticeably outperforms the rest.
8.3 Optimize Cost-Sensitive Tasks Separately
CHEAP-tier agents like Librarian and Explore are called frequently but don’t need heavy lifting. Switching to DeepSeek-V3.2 significantly reduced overall costs.
8.4 Reserve EXPENSIVE Agents for Critical Scenarios
Don’t cheap out on EXPENSIVE-tier agents like Oracle and Metis. They handle complex tasks that require strong reasoning capabilities.
8.5 Testing Beats Theory
After configuration, I recommend testing these typical scenarios:
- Code search (triggers Explore)
- Documentation retrieval (triggers Librarian)
- Visual analysis (triggers Multimodal-looker)
- Complex architecture design (triggers Oracle or Ultrabrain)
9. Conclusion
The biggest takeaway from this configuration exercise: there’s no best model, only the most suitable one.
- Kimi-K2.5: Top choice for multimodal scenarios—visual analysis, long document processing
- MiniMax-M2.5: Code review and quick edits powerhouse with unbeatable value
- GLM-5.0: The “brain” for complex planning and orchestration—low hallucination matters
- DeepSeek-V3.2: Budget-friendly expert for deep work and documentation retrieval
- GLM-4.7: Balanced performer for medium-complexity tasks
After applying this configuration, the entire system feels noticeably more efficient. Each agent is doing what it excels at, and costs are more reasonable too.
If you’re also using oh-my-opencode, I’d suggest tweaking the configuration based on your own use cases. After all, finding the right partner for the job is what doubles your productivity.
Note: This article is based on model data from March 2026. Benchmarks and pricing may change—please refer to the latest data.