Haiku 4.5 is Anthropic’s most efficient model — designed to deliver 2025 flagship-class intelligence at a fraction of the latency and cost. With a first-token latency of 0.20 seconds and throughput exceeding 180 tokens/sec, it is the first choice for real-time UI agents, high-frequency content moderation, and enterprise workloads that require millions of requests per hour without performance degradation.Documentation Index
Fetch the complete documentation index at: https://documentation.deepmask.io/llms.txt
Use this file to discover all available pages before exploring further.
About Haiku 4.5
Released in late 2025, Claude Haiku 4.5 brings enhanced computer use capabilities to the lightweight tier — making it practical not just for chat and classification, but also for simple navigation and form-filling in web browsers and desktop environments. It achieves a 73.2% GPQA Diamond score and a 95% success rate on complex JSON schemas, making it reliable for structured tool-calling pipelines at high volume.Key Capabilities
Instant Tool Calling
Optimized for high-speed function execution with a 95% success rate on complex JSON schemas — reliable enough for production API pipelines.
Computer Use (Light)
Handles simple navigation and form-filling within web browsers and desktop environments without requiring a heavier model.
Massive Throughput
Sustains 180+ tokens/sec across enterprise workloads that require millions of requests per hour without performance degradation.
Advanced Content Moderation
Context-aware safety filters reduce false positives by 40% compared to previous versions, making moderation pipelines more precise.
Best For
Haiku 4.5 is the right choice when speed and cost efficiency matter more than maximum reasoning depth. It is ideal as the execution layer in multi-agent systems — acting as the fast “hands and eyes” for a larger orchestrating model like Opus or Sonnet. For tasks that require deeper reasoning, complex document analysis, or autonomous multi-step workflows, Sonnet 4.5 or 4.6 is the appropriate step up.Use Cases
- Customer support chatbots — Near-instant, high-quality responses for global user bases at scale.
- Large-scale data labeling — Classifying millions of records with high accuracy for research and model training pipelines.
- Sub-agent swarms — Acting as the fast execution layer for larger orchestrating models handling repetitive, low-level tasks.
- Real-time content moderation — Context-aware filtering for user-generated content platforms requiring high throughput.
Specifications
| Specification | Value |
|---|---|
| Provider | Anthropic |
| Context Window | 200K tokens |
| Reasoning | Adaptive (Standard/High) |
| GPQA Diamond | 73.2% |
| Latency (TTFT) | 0.20s |
| Throughput | 180+ tokens/sec |
| Key use cases | High-volume support, coding sub-agents, real-time data |