GPT-OSS 120B is OpenAI’s 2026 open-source contribution to the frontier model ecosystem, available in DeepMask through two EU-hosted infrastructure providers: StackIT and Infercom. It delivers GPT-4-tier intelligence under an Apache 2.0 license, with full chain-of-thought transparency and adjustable reasoning effort — making it the model of choice for organizations that need frontier-class AI without black-box opacity or data leaving European infrastructure.Documentation Index
Fetch the complete documentation index at: https://documentation.deepmask.io/llms.txt
Use this file to discover all available pages before exploring further.
About GPT-OSS 120B
Built on a Mixture-of-Experts (MoE) architecture, GPT-OSS 120B uses sparse activation to stay fast and efficient — activating only a fraction of its parameters per request. The Infercom variant is optimized for high-throughput deployments, reaching up to 544 tokens/sec. The StackIT variant is tuned for sovereign enterprise deployments with a focus on transparent reasoning and strict schema enforcement. Neither variant supports image inputs.Both the StackIT and Infercom variants of GPT-OSS 120B are hosted entirely within the European Union, making them suitable for use cases governed by GDPR and sector-specific data residency requirements.
Key Capabilities
Transparent Chain-of-Thought
Full visibility into internal reasoning steps — critical for legal, medical, and compliance use cases where “black box” AI is unacceptable.
Adjustable Reasoning Effort
Switch between Low (fast), Medium (balanced), and High (deep analytical thinking) per request to control cost and latency.
JSON Mode Precision
Native strict schema enforcement ensures near-perfect reliability for API-driven agents and structured output pipelines.
High-Speed Throughput
The Infercom variant exceeds 500 tokens/sec on optimized stacks — one of the fastest models in its weight class.
Best For
GPT-OSS 120B is ideal when you need frontier-level reasoning on-premises or within EU-hosted infrastructure. It is the right choice for legal and clinical workflows where reasoning transparency is mandatory, for privacy-sensitive production environments in finance and healthcare, and for high-volume agentic pipelines that need both speed and analytical depth. It does not support image inputs or tool use in the DeepMask interface — for those capabilities, see GPT-4o or the GPT-5 series.Use Cases
- Clinical summarization — Processing patient histories locally under HIPAA- or GDPR-equivalent data residency requirements.
- Legal research — Analyzing sensitive litigation documents without any cloud exposure outside the EU.
- Local coding assistants — Running a high-intelligence coding model entirely on private, EU-resident infrastructure.
- STEM and technical research — Graduate-level science and mathematics reasoning with verifiable reasoning steps.
Specifications
| Specification | StackIT | Infercom |
|---|---|---|
| Provider | OpenAI (open-source) | OpenAI (open-source) |
| Hosting | EU (StackIT) | EU (Infercom) |
| Context Window | 131K tokens | 131K tokens |
| Reasoning | High | Adaptive (Low, Medium, High) |
| GPQA Diamond | 80.9% | 80.9% |
| Latency (TTFT) | 0.27s | 0.37s |
| Throughput | 262 tokens/sec | 313–544 tokens/sec |
| Image support | No | No |
| Key use cases | Agentic security, sovereign DevOps | High-speed agents, API orchestration, coding |