Skip to main content

Documentation Index

Fetch the complete documentation index at: https://documentation.deepmask.io/llms.txt

Use this file to discover all available pages before exploring further.

Haiku 4.5 is Anthropic’s most efficient model — designed to deliver 2025 flagship-class intelligence at a fraction of the latency and cost. With a first-token latency of 0.20 seconds and throughput exceeding 180 tokens/sec, it is the first choice for real-time UI agents, high-frequency content moderation, and enterprise workloads that require millions of requests per hour without performance degradation.

About Haiku 4.5

Released in late 2025, Claude Haiku 4.5 brings enhanced computer use capabilities to the lightweight tier — making it practical not just for chat and classification, but also for simple navigation and form-filling in web browsers and desktop environments. It achieves a 73.2% GPQA Diamond score and a 95% success rate on complex JSON schemas, making it reliable for structured tool-calling pipelines at high volume.

Key Capabilities

Instant Tool Calling

Optimized for high-speed function execution with a 95% success rate on complex JSON schemas — reliable enough for production API pipelines.

Computer Use (Light)

Handles simple navigation and form-filling within web browsers and desktop environments without requiring a heavier model.

Massive Throughput

Sustains 180+ tokens/sec across enterprise workloads that require millions of requests per hour without performance degradation.

Advanced Content Moderation

Context-aware safety filters reduce false positives by 40% compared to previous versions, making moderation pipelines more precise.

Best For

Haiku 4.5 is the right choice when speed and cost efficiency matter more than maximum reasoning depth. It is ideal as the execution layer in multi-agent systems — acting as the fast “hands and eyes” for a larger orchestrating model like Opus or Sonnet. For tasks that require deeper reasoning, complex document analysis, or autonomous multi-step workflows, Sonnet 4.5 or 4.6 is the appropriate step up.
In multi-agent architectures, consider using Haiku 4.5 to handle repetitive, low-level sub-tasks (data labeling, form extraction, moderation checks) while reserving Sonnet or Opus for planning, summarization, and complex reasoning steps. This pattern can reduce overall costs by 60–80%.

Use Cases

  • Customer support chatbots — Near-instant, high-quality responses for global user bases at scale.
  • Large-scale data labeling — Classifying millions of records with high accuracy for research and model training pipelines.
  • Sub-agent swarms — Acting as the fast execution layer for larger orchestrating models handling repetitive, low-level tasks.
  • Real-time content moderation — Context-aware filtering for user-generated content platforms requiring high throughput.

Specifications

SpecificationValue
ProviderAnthropic
Context Window200K tokens
ReasoningAdaptive (Standard/High)
GPQA Diamond73.2%
Latency (TTFT)0.20s
Throughput180+ tokens/sec
Key use casesHigh-volume support, coding sub-agents, real-time data
Try Haiku 4.5 in DeepMask →