How to Choose the Right AI Model in DeepMask

DeepMask is built around the principle that no single model is best for every task. You can switch models at any point — including mid-conversation — without losing your context. Use the tabs below to find the right model for what you’re working on right now.

These models perform well on software engineering tasks including code generation, debugging, architecture design, and long-horizon autonomous development.

Model	Why it fits
GPT-5.2 / 5.3 / 5.4	Most capable OpenAI models; strong reasoning and tool use across all coding tasks
Sonnet 4.5 / 4.6	Gold standard for autonomous coding; handles 30+ hour engineering sessions with 1M token context
Gemini 2.5 Pro	Large context window suits large codebase analysis and multi-file refactors
Kimi K2 (DeepMask)	Agent Swarm Mode enables 100 parallel sub-agents for complex, multi-step builds
DeepSeek V3	671B MoE model with frontier-level coding and math; strong on STEM and security analysis
MiniMax M2 / M2.1	Built specifically for elite multi-language coding and advanced agent workflows

These models excel at synthesizing large volumes of information, reasoning over documents, and producing structured analytical outputs.

Model	Why it fits
Kimi K2 (DeepMask)	Searches hundreds of sources simultaneously in Agent Swarm Mode; 2M token context for massive document sets
Opus 4.5 / 4.6	Anthropic’s highest-capability tier; strong on complex multi-document reasoning and subtle logical analysis
GPT-5.2 / 5.3 / 5.4	Full document and image analysis with strong reasoning across all content types
DeepSeek V3	Outperforms most frontier models on AIME and MATH-500; strong for STEM research and symbolic math
Gemini 2.5 Pro	1M+ token context; suited for summarizing large research corpora
Gemini 2.5 Flash	Handles real-time summarization of hundreds of PDFs or hour-long recordings in one pass

These models produce high-quality long-form text, adapt to different tones, and handle creative and professional writing tasks.

Model	Why it fits
GPT-5.2	Highly capable for creative and persuasive writing; shown in-product generating marketing copy and strategy content
Opus 4.5	Nuanced, high-fidelity writing with strong narrative coherence; suited for strategy, legal, and financial documents
Mistral Large 3	Elite multilingual writing across 40+ languages; good for international marketing and professional content
Sonnet 4.5	Balanced between quality and speed; well-suited for content workflows requiring document context
Mistral Medium 3	Frontier-level writing output at significantly lower cost; good for high-volume content generation

When you need quick responses, high throughput, or a cost-efficient model for simple queries and automation pipelines, these models deliver.

Model	Why it fits
Haiku 4.5	Anthropic’s fastest model at 180+ tokens/sec with 0.20s latency; designed for enterprise-scale workloads
Gemini 2.5 Flash	185 tokens/sec with a 1M token context window; most cost-effective for large-volume document processing
GLM-4.7 Flash	Lightweight MoE model with strong reasoning and coding accuracy at high speed
Mistral Medium 3	8× lower cost than frontier-tier models with strong general performance

These models apply extended or structured thinking to work through complex, multi-step problems including math, logic, planning, and ambiguous tasks.

Model	Why it fits
Kimi K2 (DeepMask)	High reasoning effort; 87.6% GPQA Diamond; decomposes tasks into 100 parallel sub-tasks
Sonnet 4.5 / 4.6	Adaptive reasoning (standard/high); 83.4% GPQA Diamond; significant gains on graduate-level math and science
GPT-5.2 / 5.3 / 5.4	Strong reasoning built into the latest GPT-5 generation
GPT-o3 Mini	Dedicated reasoning model for document analysis and tool use
DeepSeek V3	Adaptive non-thinking/thinking mode; 80.7% GPQA Diamond; excellent for mathematical proofs
GLM-4.7	Interleaved thinking with elite agent workflows; strong on complex real-world tasks

If your organization requires that all data processing occurs within the European Union, every model in this tab runs exclusively on EU infrastructure.

Model	Host	Notes
Qwen3 (StackIT)	StackIT (Schwarz Group, Germany)	Reasoning, tool use, document and image analysis
GPT-OSS 120B (StackIT)	StackIT (Schwarz Group, Germany)	Document analysis and research; no image support
Gemma 3 27B (StackIT)	StackIT (Schwarz Group, Germany)	Lightweight; chat and document/image analysis
DeepSeek V3.1 (Infercom)	Infercom (EU-hosted endpoints)	Document analysis, tool use, complex writing
GPT-OSS 120B (Infercom)	Infercom (EU-hosted endpoints)	Document analysis and research; no image support
MiniMax M2.5 (Infercom)	Infercom (EU-hosted endpoints)	Coding, document analysis, tool use; 164K context
Kimi K2 (DeepMask)	DeepMask EU infrastructure	Full capability including Agent Swarm Mode
Qwen (DeepMask)	DeepMask EU infrastructure	Reasoning, tool use, multilingual chat

StackIT is operated by the Schwarz Group (parent company of Lidl and Kaufland) and is certified as a German sovereign cloud. Infercom provides EU-hosted LLM endpoints with strict data residency controls. DeepMask’s own infrastructure is also EU-based.

You can switch models at any point in a conversation. If a response isn’t working well, select a different model from the model picker and continue your conversation — DeepMask carries your context forward automatically.

Model capability quick reference

What does 'extended thinking' mean?

Extended thinking (sometimes shown as “reasoning mode” or “thinking mode” in the UI) causes a model to work through a problem step-by-step before producing its final answer. The model generates an internal chain of reasoning that it uses to improve accuracy on complex tasks.Models in DeepMask that support extended or adaptive thinking include Sonnet 4.5 / 4.6, Haiku 4.5, Kimi K2 (DeepMask), GLM-4.7, and DeepSeek V3 (in its thinking mode). GPT-o3 Mini is also specifically optimized for reasoning tasks.Extended thinking increases response time but significantly improves results for graduate-level math, multi-step logic, planning, and any task where intermediate reasoning matters.

Which models support image analysis?

The following models in DeepMask can accept images as input and reason about their contents:

OpenAI: GPT-4o, GPT-4.1, GPT-5.2, GPT-5.3, GPT-5.4
Anthropic: Opus 4.5 / 4.6, Sonnet 4.5 / 4.6, Haiku 4.5
Google: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemma 3 27B (StackIT)
MoonshotAI: Kimi K2 (DeepMask), Kimi K2.5 (via MoonViT multimodal)
Alibaba: Qwen (DeepMask), Qwen3 (StackIT)
Mistral: Mistral Large 3

Models that do not support image input include GPT-OSS 120B (both StackIT and Infercom variants), DeepSeek V3 / V3.1, GPT-o3 Mini, and MiniMax M2.5 (Infercom).

Which models support tool use and MCP?

Tool use (also called function calling) lets a model invoke external tools, APIs, or data connectors during a conversation. DeepMask exposes tool use through its MCP (Model Context Protocol) connector framework, which supports Google Drive, Gmail, SharePoint, Salesforce, and more.Models with strong tool use support include:

Anthropic: Haiku 4.5 (95% success rate on complex JSON schemas), Sonnet 4.5 / 4.6, Opus 4.5 / 4.6
OpenAI: GPT-4o, GPT-4.1, GPT-5.x, GPT-o3 Mini
MoonshotAI: Kimi K2 (DeepMask) — maintains coherence across 300+ sequential tool calls
Google: Gemini 2.5 Pro, Gemini 2.5 Flash
Alibaba: Qwen (DeepMask), Qwen3 (StackIT)
DeepSeek: DeepSeek V3, DeepSeek V3.1 (Infercom)
MiniMax: M2, M2.1, M2.5 (Infercom)
Z.ai: GLM-4.7

Gemma 3 27B (StackIT) does not support tool use in DeepMask.

What is context window size?

The context window is the maximum amount of text (measured in tokens, where 1 token ≈ 0.75 words) that a model can read and reason over in a single conversation. Larger context windows let you work with longer documents, more conversation history, and bigger codebases without losing earlier information.Context window sizes for key models in DeepMask:

Model	Context window
Kimi K2 (DeepMask)	2,000,000 tokens
Sonnet 4.5 / 4.6	1,000,000 tokens
Gemini 2.5 Flash	1,040,000 tokens
Gemini 2.5 Pro	~1,000,000 tokens
Haiku 4.5	200,000 tokens
DeepSeek V3 / V3.1	128,000 – 164,000 tokens
MiniMax M2.5 (Infercom)	164,000 tokens
GPT-4o	128,000 tokens
GPT-4.1	128,000 tokens

For very long documents or multi-session projects, prefer Kimi K2, the Sonnet series, or the Gemini 2.5 models. Use DeepMask Projects to persist files and instructions across sessions regardless of the model you choose.

Model Guide

OpenAI

Anthropic

Google & Others

How to Choose the Right AI Model in DeepMask

Model capability quick reference

Model Guide

OpenAI

Anthropic

Google & Others

Documentation Index

​Model capability quick reference

Model capability quick reference