Skip to main content

Documentation Index

Fetch the complete documentation index at: https://documentation.deepmask.io/llms.txt

Use this file to discover all available pages before exploring further.

GPT-OSS 120B is OpenAI’s 2026 open-source contribution to the frontier model ecosystem, available in DeepMask through two EU-hosted infrastructure providers: StackIT and Infercom. It delivers GPT-4-tier intelligence under an Apache 2.0 license, with full chain-of-thought transparency and adjustable reasoning effort — making it the model of choice for organizations that need frontier-class AI without black-box opacity or data leaving European infrastructure.

About GPT-OSS 120B

Built on a Mixture-of-Experts (MoE) architecture, GPT-OSS 120B uses sparse activation to stay fast and efficient — activating only a fraction of its parameters per request. The Infercom variant is optimized for high-throughput deployments, reaching up to 544 tokens/sec. The StackIT variant is tuned for sovereign enterprise deployments with a focus on transparent reasoning and strict schema enforcement. Neither variant supports image inputs.
Both the StackIT and Infercom variants of GPT-OSS 120B are hosted entirely within the European Union, making them suitable for use cases governed by GDPR and sector-specific data residency requirements.

Key Capabilities

Transparent Chain-of-Thought

Full visibility into internal reasoning steps — critical for legal, medical, and compliance use cases where “black box” AI is unacceptable.

Adjustable Reasoning Effort

Switch between Low (fast), Medium (balanced), and High (deep analytical thinking) per request to control cost and latency.

JSON Mode Precision

Native strict schema enforcement ensures near-perfect reliability for API-driven agents and structured output pipelines.

High-Speed Throughput

The Infercom variant exceeds 500 tokens/sec on optimized stacks — one of the fastest models in its weight class.

Best For

GPT-OSS 120B is ideal when you need frontier-level reasoning on-premises or within EU-hosted infrastructure. It is the right choice for legal and clinical workflows where reasoning transparency is mandatory, for privacy-sensitive production environments in finance and healthcare, and for high-volume agentic pipelines that need both speed and analytical depth. It does not support image inputs or tool use in the DeepMask interface — for those capabilities, see GPT-4o or the GPT-5 series.
For legal and compliance workflows, use High reasoning effort to maximize analytical depth. For high-volume document classification or extraction pipelines, Medium effort typically provides the best cost-per-quality tradeoff.

Use Cases

  • Clinical summarization — Processing patient histories locally under HIPAA- or GDPR-equivalent data residency requirements.
  • Legal research — Analyzing sensitive litigation documents without any cloud exposure outside the EU.
  • Local coding assistants — Running a high-intelligence coding model entirely on private, EU-resident infrastructure.
  • STEM and technical research — Graduate-level science and mathematics reasoning with verifiable reasoning steps.

Specifications

SpecificationStackITInfercom
ProviderOpenAI (open-source)OpenAI (open-source)
HostingEU (StackIT)EU (Infercom)
Context Window131K tokens131K tokens
ReasoningHighAdaptive (Low, Medium, High)
GPQA Diamond80.9%80.9%
Latency (TTFT)0.27s0.37s
Throughput262 tokens/sec313–544 tokens/sec
Image supportNoNo
Key use casesAgentic security, sovereign DevOpsHigh-speed agents, API orchestration, coding
Try GPT-OSS 120B in DeepMask →