Basics & Getting Started

What exactly is a "token" and why does it matter?

+
A token is a chunk of text that the AI model processes. It's not exactly a word — think of it more like a syllable or character cluster. On average, 1 token ≈ 0.75 words, so 100 words = roughly 133 tokens. AI providers charge per million tokens, not per word or request. If you send a 500-word prompt and get back a 300-word response, you just burned ~1,065 tokens. Multiply that by thousands of requests and it adds up fast. Understanding tokens is the single most important thing for controlling costs.

I'm just starting out. Should I even worry about costs yet?

+
Honestly? If you're doing under 1,000 requests a month, your bill will be under $5-10 — not worth overthinking. But the moment you hit product-market fit and scale to 10K+ requests, costs become real. A friend of mine went from $20/month to $2,400/month in two weeks after launching on Product Hunt. Plan ahead. At minimum, set up billing alerts and understand the batch API before you need it.

How do I calculate my monthly token usage?

+
Use this formula: (monthly requests) × (avg words per prompt) × 1.33 = input tokens. Then estimate your output tokens (usually 0.5x to 2x your input, depending on the task). Plug those into our calculator above. Example: 10,000 requests, 500 words per prompt = 6.65M input tokens. If outputs average 300 words each, that's 4M output tokens. At GPT-5.4 pricing, that's $16.63 (input) + $60 (output) = ~$77/month.

Pricing & Models

Why are output tokens so much more expensive than input?

+
Generating new text requires way more compute than reading existing text. When the model processes your prompt (input), it's just running inference on pre-trained weights — relatively cheap. When it generates a response (output), it has to predict the next token thousands of times in sequence, which is computationally intensive. That's why output is usually 3-6x more expensive. For GPT-5.4: input is $2.50/M, output is $15/M — a 6x difference.

Is DeepSeek actually good or just cheap?

+
DeepSeek V4 is legitimately capable for a ton of use cases — classification, extraction, simple summarization, translation, basic Q&A. It's not going to match GPT-5 or Claude 4.6 on complex reasoning or creative writing, but at $0.28 per million tokens (both input and output), it's a no-brainer for high-volume, straightforward tasks. I use it for tagging support tickets and it works perfectly. Don't use it for mission-critical analysis, but for volume work? Absolutely.

Do API prices ever go down, or just up?

+
They actually go down surprisingly often. OpenAI cut GPT-4 Turbo prices by 50% in 2024. Anthropic dropped Claude 3.5 Sonnet pricing multiple times. As compute gets cheaper and models get more efficient, providers compete on price. That said, the newest flagship models (like GPT-5.4 or Claude 4.6) tend to launch expensive and drop over 6-12 months. If cost matters more than having the absolute latest model, wait a few months after launch.

Are there any truly free AI APIs?

+
Nothing powerful is free at scale, but most providers have free tiers for testing. OpenAI gives new accounts $5 in credits. Anthropic gives $5. Google's Gemini has a generous free tier for low-volume use (15 requests per minute). These are fine for prototyping, but once you hit production traffic, you'll need a paid account. Don't build a business on free tiers — they can disappear or get rate-limited without warning.

Batch Processing

What's the catch with the 50% batch discount?

+
The only catch is time. Your requests get queued and processed within 24 hours instead of instantly. That's it. Same model, same quality, same outputs — just delivered hours later instead of seconds later. If your use case doesn't need real-time results (content generation, data processing, nightly jobs), there's literally no downside. I moved 80% of my workload to batch and cut my OpenAI bill from $1,200 to $640/month.

Can I mix batch and real-time API calls in the same app?

+
Absolutely. That's the ideal setup. Use real-time API for anything user-facing (chat, live autocomplete, interactive features), and batch API for background tasks (content pipelines, data enrichment, scheduled reports). Most production apps run a hybrid architecture. The batch API is just a different endpoint — your prompts stay exactly the same.

How long does a batch job actually take to complete?

+
In practice, most batches finish in 2-8 hours, not the full 24. OpenAI and Anthropic process batches continuously and return results as soon as they're done. I've had small batches (under 1,000 requests) finish in under an hour. Large batches (50K requests) might take 10-12 hours. You get a webhook or can poll the status endpoint to know when it's ready.

Cost Optimization

What's the single biggest mistake people make with LLM costs?

+
Using the biggest, most expensive model for everything. I see developers defaulting to GPT-5.4 or Claude 4.6 Sonnet for tasks that GPT-5 Nano or DeepSeek could handle perfectly. It's like renting a Ferrari to drive to the grocery store. Match the model to the task. Classification? Use Nano. Creative writing? Use Sonnet. Simple extraction? Use DeepSeek. You'll cut costs by 80-95% on half your workload.

How can I reduce output token costs specifically?

+
Ask for shorter responses. Seriously. Add "Be concise" or "Limit response to 100 words" to your system prompt. Use structured outputs (JSON) instead of prose when possible — JSON is way more token-efficient than natural language. If you're generating content, generate outlines first, then expand only what you need. And always set max_tokens limits so the model can't run wild.

Should I cache prompts to save money?

+
Yes, but be strategic. If you're sending the same prompt repeatedly (like a fixed system prompt or few-shot examples), use prompt caching (Anthropic offers this, OpenAI has similar features coming). You pay once to cache it, then only pay for the new content on each request. Can save 30-50% on input costs for apps with long system prompts. But don't cache unique content — that defeats the purpose.

Is it worth fine-tuning a model to reduce costs?

+
Only at serious scale. Fine-tuning can reduce the tokens needed per request (because the model "knows" your domain better), but the upfront cost and ongoing maintenance usually only makes sense above $5K-10K/month in API spend. For most startups, smart prompting + model selection + batch API gets you 90% of the savings with 1% of the effort.

Choosing Providers

Should I stick to one provider or use multiple?

+
Most teams start with one (usually OpenAI) for simplicity, then branch out as they scale. Using multiple providers gives you pricing leverage, redundancy if one goes down, and the ability to pick the best model for each task. But it adds complexity — you need abstraction layers, separate API keys, different SDKs. My advice: start with one, switch or add a second only when you have a clear cost or capability reason.

Does Anthropic's Claude offer better value than OpenAI?

+
Depends on your use case. Claude 4.6 Sonnet is $3 input / $15 output vs GPT-5.4's $2.50 / $15 — so slightly more expensive on input, same on output. But Claude excels at long-context tasks (200K+ tokens) and tends to follow instructions more precisely, which can reduce retry loops. For creative writing and analysis, many developers prefer Claude. For structured output and function calling, GPT-5 is often easier. Test both and measure which one gets you the result faster.

What about local/open-source models to avoid API costs?

+
Running Llama 3 or Mistral locally can be free (just hardware costs), but you need GPUs, devops expertise, and time. For small teams, cloud APIs are almost always cheaper when you factor in engineering time and infrastructure. If you're doing millions of requests per month and have ML engineers on staff, self-hosting makes sense. Otherwise, stick with APIs. The math flips around $10K-20K/month in API spend.

Billing & Limits

How do I avoid surprise bills?

+
Set hard billing limits in your provider dashboard. OpenAI, Anthropic, and Google all let you cap monthly spend. Set it to 2x what you expect to use, and set up email alerts at 50%, 75%, and 90% of that limit. Also monitor your usage daily for the first month after launch — costs can spike unexpectedly if a user finds a way to abuse your API or if your prompts are way longer than expected.

Do AI providers charge for failed requests?

+
No. If a request fails with an error (rate limit, server error, invalid input), you're not charged. You only pay for successful completions. That said, if your code is retrying failed requests in a loop without proper backoff logic, you can rack up costs from repeated attempts. Always implement exponential backoff and max retry limits.

Can I get volume discounts from providers?

+
Yes, but you need to be spending serious money. OpenAI and Anthropic offer custom enterprise pricing once you're consistently above $5K-10K/month. Below that, you're stuck with published rates. If you're spending $20K+/month, definitely reach out to their enterprise sales teams — you can usually negotiate 10-30% off. Google is reportedly more flexible on pricing than others.