API ready. Haiku 4.5 from $0.25 / M tokens. Get a key →
claude-haiku-4-5 · production-ready

Fast, cheap,
and properly smart.

Haiku 4.5 is the Claude model built to ship in production. Sub-second responses. Twenty-five cents per million input tokens. The same SDK as every other Claude model.

60-second signup $5 free credit Drop-in for any Claude model
three_lines_to_haiku.py
import anthropic

response = anthropic.Anthropic().messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=512,
    messages=[{"role": "user",
               "content": "Classify this ticket."}]
)

print(response.content[0].text)
# → returns in ~600ms.
# → costs ~$0.0001 per call.
# → ready to scale to millions.
That's it. That's the entire integration. More recipes →
// the numbers

Built for scale, priced for volume.

Haiku 4.5 is what you reach for when latency matters, cost matters, or both — chatbots, classifiers, agents that fan out into thousands of parallel calls.

$0.25/ M in
Input tokens
$1.25/ M out
Output tokens
200K
Context window
50% off
Batch API discount
~10×
Cheaper than Sonnet
3lines
To first response
0migration
Same SDK as every Claude
scale
Built for high throughput
// what to build

Six things made for Haiku.

Not every problem needs a flagship model. Here's where Haiku 4.5 quietly outperforms — same quality on the task, fraction of the cost and latency.

Chat

Customer support bots

Answer common tickets, triage to humans, follow tone guidelines. Haiku's speed makes the chat feel native — not "AI is thinking."

~$1 handles 10,000 short replies
Pipeline

Content moderation

Classify text or images at scale: spam, abuse, policy violations, sentiment, intent. Run as a stream, queue, or batch job.

Batch API → $0.125 / M input tokens
Search

Retrieval-augmented Q&A

Use Haiku as the answer-the-question step on top of your vector search. Cheap enough to run on every query, smart enough to be honest.

200K context → entire doc sets per call
Agent

Tool-using agents

Long-running agents fan out into many small Haiku calls — planning, classifying, looking up, formatting. Pair with Opus for hard reasoning steps.

Tool use + MCP → full support
Compress

Summarization at scale

Distill threads, transcripts, articles, support tickets, or product reviews into clean structured output. Run nightly. Run hourly.

10K transcripts → under $5
Extract

Data extraction

Pull structured data out of messy text — invoices, emails, contracts, web pages. Output JSON. Use the batch endpoint when you can wait.

JSON mode + tool use → reliable
俳句

One model. One call.
Returns before your fingers can
leave the keyboard. Done.

— a haiku for Haiku 4.5 · 5 / 7 / 5
// code recipes

Three things, copy-pasteable.

No tutorial. No setup wizard. Just paste these into your editor, replace the API key, run.

Streaming chat

~200ms TTFB
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const stream = client.messages.stream({
  model: "claude-haiku-4-5-20251001",
  max_tokens: 1024,
  messages: [{
    role: "user",
    content: "Help me debug this..."
  }]
});

for await (const chunk of stream) {
  process.stdout.write(chunk);
}
typescript · streaming Get key →

Batch classifier

50% off
import anthropic

batch = anthropic.Anthropic().beta.messages.batches.create(
  requests=[
    {"custom_id": f"req_{i}",
     "params": {
       "model": "claude-haiku-4-5-20251001",
       "max_tokens": 128,
       "messages": [{
         "role": "user",
         "content": f"Classify: {text}"
       }]
     }}
    for i, text in enumerate(items)
  ]
)
# 50% cheaper, async, ideal for backfill.
python · batch api Get key →

Tool-using agent

low cost / call
import anthropic

response = anthropic.Anthropic().messages.create(
  model="claude-haiku-4-5-20251001",
  max_tokens=1024,
  tools=[{
    "name": "lookup_order",
    "description": "Get an order by id.",
    "input_schema": {"type": "object",
       "properties": {"id": {"type": "string"}}}
  }],
  messages=[{"role": "user",
             "content": "Where's order #4291?"}]
)
# Haiku picks the tool, fills args, ships fast.
python · tool use Get key →

Spin up an API key.
Ship in the next hour.

Sign in, generate a key, paste one of the recipes above into your editor. The first response will land before your coffee gets cold.

Get API Key $5 free credit · no card to start
// stack-up

When you shouldn't use Haiku.

Haiku is the right tool for a specific shape of problem. Here's where it wins, where it loses, and what to do when you need more.

Haiku 4.5 Sonnet 4.6 Opus 4.7
Best for Most everyday tasks Hardest reasoning
Speed Fast Thoughtful
Cost (per M input) $3.00 $15.00
Cost (per M output) $15.00 $75.00
Reasoning depth ★★★★ ★★★★★
Context window 200K 200K
Tool use / agents ✓ Full ✓ Best
Use it when… Quality + speed balance Quality is non-negotiable

Pricing reflects published API rates. Always check docs.claude.com for current numbers.

// pricing

Simple. Per token. Pay-as-you-go.

No seats. No tiers. No annual contracts. Sign up, get $5 in free credit, and burn through it on whichever model fits the job.

Model Input ($ / M tokens) Output ($ / M tokens) Context
200K
Sonnet 4.6 $3.00 $15.00 200K
Opus 4.7 $15.00 $75.00 200K

Batch API: 50% off across all models · Prompt caching: up to 90% savings on repeated context · See full pricing at anthropic.com/pricing.

// faq

Quick answers about Haiku 4.5.

The questions developers ask most often before they switch a workload over.

What's the model string?

claude-haiku-4-5-20251001. Pass it into any Anthropic SDK call exactly where you'd previously pass a Sonnet or Opus model. Zero migration.

How does Haiku 4.5 compare on quality?

For tasks that don't require deep reasoning — classification, extraction, summarization, retrieval Q&A, simple chat — Haiku 4.5 is genuinely close to Sonnet quality at a fraction of the cost and latency. For hard reasoning, complex code, or long-horizon agents, use Sonnet or Opus. See the comparison →

Does Haiku 4.5 support tool use, vision, and streaming?

Yes — full support. Tool use, JSON mode, vision (image inputs), streaming, the batch API, prompt caching, and MCP integrations all work the same way they do on Sonnet and Opus.

How fast is "ultra fast"?

Time-to-first-token is typically a few hundred milliseconds. For short replies you'll often see end-to-end completion under one second. Exact numbers depend on prompt length, region, and load — measure on your own workload before committing.

Can I use the Batch API?

Yes — Haiku supports the Batch API at a 50% discount, ideal for backfills, periodic jobs, or anything where latency isn't critical. With prompt caching layered on top, costs can drop another 50–90%.

What's the context window?

200,000 tokens — the same as Sonnet and Opus. You can drop in long documents, transcripts, or RAG context and Haiku reasons over it cleanly.

Is there a free tier for the API?

Yes. New API accounts include $5 in free credit, which goes a long way on Haiku — enough to run thousands of short calls before you'd add a payment method. Sign up at console.anthropic.com.

Can I use Haiku 4.5 in Claude.ai or Claude Code?

Yes. Claude.ai paid plans include Haiku as a switchable model in the dropdown. Claude Code accepts --model claude-haiku-4-5-20251001 for fast, cheap routine tasks.

Will my API data be used for training?

By default, Anthropic does not use API data to train models. Enterprise customers can additionally enable zero data retention. Full details at anthropic.com/privacy.

How do I move an existing workload from Sonnet to Haiku?

Change the model string. That's the migration. Test on a sample of your real traffic, compare outputs, watch for any quality regressions on the harder edge cases, and route those back to Sonnet or Opus if needed. Most teams ship the change in under a day.

Build the next thing.
Ship it on Haiku.

Get an API key, paste one of the recipes, and watch a response stream back in under a second. The whole loop takes less time than reading this page.

$ npm install @anthropic-ai/sdk · pip install anthropic · $5 free credit

Get API Key →