The Problem: Real-Time by Default
When developers first integrate an LLM into their product, they almost always call the API synchronously — send a request, wait for a response, move on. It's the path of least resistance. The problem is that real-time API calls are the most expensive way to use any LLM. You're paying a premium for instant availability, millisecond latency, and guaranteed compute slots on a provider's GPU cluster — even when you don't actually need any of those things.
Take a typical content generation pipeline: a startup producing 50,000 product descriptions per month, each averaging 400 words of output. At standard GPT-5.4 pricing of $15 per million output tokens, that's roughly $1,500/month in output costs alone. The descriptions don't need to be ready in under a second. They need to be ready by morning. That's a critical distinction that most teams miss when first architecting their AI workflows.
What Is the Batch API, Exactly?
OpenAI's Batch API (and Anthropic's equivalent Message Batches API) allows you to submit a file of up to 50,000 requests at once. Instead of processing each request immediately, the provider queues them and guarantees completion within 24 hours. In exchange for this relaxed SLA, they charge exactly half the standard per-token rate — on both input and output tokens.
The mechanism is straightforward: you upload a JSONL file where each line is a self-contained API request, submit the batch, receive a batch ID, and poll or receive a webhook when results are ready. Your prompts are completely unchanged — only the delivery mechanism differs.
Workloads Perfect for Batch Processing
Before migrating, audit your current API usage and ask: "Does this task need to be completed within seconds, or within hours?" Anything in the second category is a batch candidate. Common examples: content generation pipelines, data enrichment and classification, embedding generation for vector search systems, LLM-as-a-judge evaluations, and synthetic training data generation.
The most impactful batch migration we've seen involved a mid-size SaaS company running nightly classification jobs on 200,000 customer support tickets per month. Moving this single workload from real-time to batch cut their monthly API spend from $4,200 to $2,100 — a $25,200 annual saving with two days of engineering work.
The Migration: A Practical Checklist
Moving from synchronous to batch calls is surprisingly low-friction. First, identify your non-interactive API calls — anything triggered by a cron job or background worker that doesn't require an immediate user-facing response. Second, refactor those calls to write requests to a queue rather than firing immediately. Third, build a batch submission job that runs on a schedule, collects queued requests, formats them as a JSONL file, and submits to the API. Fourth, build a results handler that processes completed batches and writes outputs back to your system. For most engineering teams, this takes one to three days of work and pays off within the first billing cycle.
What You Shouldn't Batch
User-facing chat interfaces, real-time voice assistants, and any workflow where a person is actively waiting must remain on the synchronous API. The rule is simple: if a person is staring at a loading spinner, use real-time. If a cron job fired it and no one is watching, use batch. Use the AICostHub calculator to model the exact savings for your workload.