About GPT-OSS 120B
Built on a Mixture-of-Experts (MoE) architecture, GPT-OSS 120B uses sparse activation to stay fast and efficient — activating only a fraction of its parameters per request. The Infercom variant is optimized for high-throughput deployments, reaching up to 544 tokens/sec. The StackIT variant is tuned for sovereign enterprise deployments with a focus on transparent reasoning and strict schema enforcement. Neither variant supports image inputs.Both the StackIT and Infercom variants of GPT-OSS 120B are hosted entirely within the European Union, making them suitable for use cases governed by GDPR and sector-specific data residency requirements.
Key Capabilities
Transparent Chain-of-Thought
Full visibility into internal reasoning steps — critical for legal, medical, and compliance use cases where “black box” AI is unacceptable.
Adjustable Reasoning Effort
Switch between Low (fast), Medium (balanced), and High (deep analytical thinking) per request to control cost and latency.
JSON Mode Precision
Native strict schema enforcement ensures near-perfect reliability for API-driven agents and structured output pipelines.
High-Speed Throughput
The Infercom variant exceeds 500 tokens/sec on optimized stacks — one of the fastest models in its weight class.
Best For
GPT-OSS 120B is ideal when you need frontier-level reasoning on-premises or within EU-hosted infrastructure. It is the right choice for legal and clinical workflows where reasoning transparency is mandatory, for privacy-sensitive production environments in finance and healthcare, and for high-volume agentic pipelines that need both speed and analytical depth. It does not support image inputs or tool use in the DeepMask interface — for those capabilities, see GPT-4o or the GPT-5 series.Use Cases
- Clinical summarization — Processing patient histories locally under HIPAA- or GDPR-equivalent data residency requirements.
- Legal research — Analyzing sensitive litigation documents without any cloud exposure outside the EU.
- Local coding assistants — Running a high-intelligence coding model entirely on private, EU-resident infrastructure.
- STEM and technical research — Graduate-level science and mathematics reasoning with verifiable reasoning steps.
Specifications
| Specification | StackIT | Infercom |
|---|---|---|
| Provider | OpenAI (open-source) | OpenAI (open-source) |
| Hosting | EU (StackIT) | EU (Infercom) |
| Context Window | 131K tokens | 131K tokens |
| Reasoning | High | Adaptive (Low, Medium, High) |
| GPQA Diamond | 80.9% | 80.9% |
| Latency (TTFT) | 0.27s | 0.37s |
| Throughput | 262 tokens/sec | 313–544 tokens/sec |
| Image support | No | No |
| Key use cases | Agentic security, sovereign DevOps | High-speed agents, API orchestration, coding |