Customer support bots
Answer common tickets, triage to humans, follow tone guidelines. Haiku's speed makes the chat feel native — not "AI is thinking."
Haiku 4.5 is the Claude model built to ship in production. Sub-second responses. Twenty-five cents per million input tokens. The same SDK as every other Claude model.
import anthropic response = anthropic.Anthropic().messages.create( model="claude-haiku-4-5-20251001", max_tokens=512, messages=[{"role": "user", "content": "Classify this ticket."}] ) print(response.content[0].text) # → returns in ~600ms. # → costs ~$0.0001 per call. # → ready to scale to millions.
Haiku 4.5 is what you reach for when latency matters, cost matters, or both — chatbots, classifiers, agents that fan out into thousands of parallel calls.
Not every problem needs a flagship model. Here's where Haiku 4.5 quietly outperforms — same quality on the task, fraction of the cost and latency.
Answer common tickets, triage to humans, follow tone guidelines. Haiku's speed makes the chat feel native — not "AI is thinking."
Classify text or images at scale: spam, abuse, policy violations, sentiment, intent. Run as a stream, queue, or batch job.
Use Haiku as the answer-the-question step on top of your vector search. Cheap enough to run on every query, smart enough to be honest.
Long-running agents fan out into many small Haiku calls — planning, classifying, looking up, formatting. Pair with Opus for hard reasoning steps.
Distill threads, transcripts, articles, support tickets, or product reviews into clean structured output. Run nightly. Run hourly.
Pull structured data out of messy text — invoices, emails, contracts, web pages. Output JSON. Use the batch endpoint when you can wait.
One model. One call.
Returns before your fingers can
leave the keyboard. Done.
No tutorial. No setup wizard. Just paste these into your editor, replace the API key, run.
import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic(); const stream = client.messages.stream({ model: "claude-haiku-4-5-20251001", max_tokens: 1024, messages: [{ role: "user", content: "Help me debug this..." }] }); for await (const chunk of stream) { process.stdout.write(chunk); }
import anthropic batch = anthropic.Anthropic().beta.messages.batches.create( requests=[ {"custom_id": f"req_{i}", "params": { "model": "claude-haiku-4-5-20251001", "max_tokens": 128, "messages": [{ "role": "user", "content": f"Classify: {text}" }] }} for i, text in enumerate(items) ] ) # 50% cheaper, async, ideal for backfill.
import anthropic response = anthropic.Anthropic().messages.create( model="claude-haiku-4-5-20251001", max_tokens=1024, tools=[{ "name": "lookup_order", "description": "Get an order by id.", "input_schema": {"type": "object", "properties": {"id": {"type": "string"}}} }], messages=[{"role": "user", "content": "Where's order #4291?"}] ) # Haiku picks the tool, fills args, ships fast.
Sign in, generate a key, paste one of the recipes above into your editor. The first response will land before your coffee gets cold.
Haiku is the right tool for a specific shape of problem. Here's where it wins, where it loses, and what to do when you need more.
| Haiku 4.5 | Sonnet 4.6 | Opus 4.7 | |
|---|---|---|---|
| Best for | High-volume / low-latency | Most everyday tasks | Hardest reasoning |
| Speed | Ultra fast | Fast | Thoughtful |
| Cost (per M input) | $0.25 | $3.00 | $15.00 |
| Cost (per M output) | $1.25 | $15.00 | $75.00 |
| Reasoning depth | ★★★ | ★★★★ | ★★★★★ |
| Context window | 200K | 200K | 200K |
| Tool use / agents | ✓ Full | ✓ Full | ✓ Best |
| Use it when… | Cost & latency win | Quality + speed balance | Quality is non-negotiable |
Pricing reflects published API rates. Always check docs.claude.com for current numbers.
No seats. No tiers. No annual contracts. Sign up, get $5 in free credit, and burn through it on whichever model fits the job.
| Model | Input ($ / M tokens) | Output ($ / M tokens) | Context |
|---|---|---|---|
| Haiku 4.5 | $0.25 | $1.25 | 200K |
| Sonnet 4.6 | $3.00 | $15.00 | 200K |
| Opus 4.7 | $15.00 | $75.00 | 200K |
Batch API: 50% off across all models · Prompt caching: up to 90% savings on repeated context · See full pricing at anthropic.com/pricing.
The questions developers ask most often before they switch a workload over.
claude-haiku-4-5-20251001. Pass it into any Anthropic SDK call exactly where you'd previously pass a Sonnet or Opus model. Zero migration.
For tasks that don't require deep reasoning — classification, extraction, summarization, retrieval Q&A, simple chat — Haiku 4.5 is genuinely close to Sonnet quality at a fraction of the cost and latency. For hard reasoning, complex code, or long-horizon agents, use Sonnet or Opus. See the comparison →
Yes — full support. Tool use, JSON mode, vision (image inputs), streaming, the batch API, prompt caching, and MCP integrations all work the same way they do on Sonnet and Opus.
Time-to-first-token is typically a few hundred milliseconds. For short replies you'll often see end-to-end completion under one second. Exact numbers depend on prompt length, region, and load — measure on your own workload before committing.
Yes — Haiku supports the Batch API at a 50% discount, ideal for backfills, periodic jobs, or anything where latency isn't critical. With prompt caching layered on top, costs can drop another 50–90%.
200,000 tokens — the same as Sonnet and Opus. You can drop in long documents, transcripts, or RAG context and Haiku reasons over it cleanly.
Yes. New API accounts include $5 in free credit, which goes a long way on Haiku — enough to run thousands of short calls before you'd add a payment method. Sign up at console.anthropic.com.
Yes. Claude.ai paid plans include Haiku as a switchable model in the dropdown. Claude Code accepts --model claude-haiku-4-5-20251001 for fast, cheap routine tasks.
By default, Anthropic does not use API data to train models. Enterprise customers can additionally enable zero data retention. Full details at anthropic.com/privacy.
Change the model string. That's the migration. Test on a sample of your real traffic, compare outputs, watch for any quality regressions on the harder edge cases, and route those back to Sonnet or Opus if needed. Most teams ship the change in under a day.
Get an API key, paste one of the recipes, and watch a response stream back in under a second. The whole loop takes less time than reading this page.
$ npm install @anthropic-ai/sdk · pip install anthropic · $5 free credit