// LLM Cost Calculator

The LLM Cost Calculator.

Compare cost, intelligence, speed, and latency across 100 leading LLMs. Configure your workload and see real costs at scale — built so AI builders can make informed model selection decisions in seconds.

Models Tracked 100 · growing
Source Artificial Analysis
Last update May 2026 LIVE · UPDATED WEEKLY
Configurator

Your workload

Pick a tier, set your token volume, and the top 5 results update instantly.

Cost Estimate

Estimated cost for your workload

Costs reflect your configured input, output, and request volume. Latest model from each tracked provider — or pick a specific provider to see their full lineup.

All Models · Ranked

// 100 models · sortable
Export CSV ↓
// Showing 100 of 100 models
# Provider Model Intelligence Blended $/1M Speed Latency Context Tier Cost/Run Monthly
API + MCP Access

Use the data programmatically.

REST API plus an MCP server that lets your AI agents query model pricing, performance, and capability data in real time. Build cost-aware AI apps without scraping leaderboards.

  • JSON REST endpoint for all 100 models
  • MCP server for direct AI agent integration
  • Weekly refresh cadence built in
  • Free tier for non-commercial use
// WAITLIST OPEN · LAUNCHING Q3 2026
# Get the cheapest Frontier model under $1/M tokens curl https://api.aiarmy.co/llm-cost/v1/models \ -G --data-urlencode "tier=Frontier" \ --data-urlencode "max_blended=1.0" \ --data-urlencode "sort=blended_asc"

Frequently asked questions.

How is blended cost calculated?

Blended cost combines input and output token pricing in a single $/1M figure, weighted by typical usage patterns. The calculator above lets you override the default assumption (10K input / 2K output) with your actual workload to see model-specific costs.

What's the cheapest LLM in 2026?

Several open-weight models from Google, Alibaba, and DeepSeek cost under $0.10/M tokens. Gemma 4 31B, DeepSeek V4 Flash, and gpt-oss-20B all land under that threshold while still scoring above 24 on the Artificial Analysis Intelligence Index. For frontier-tier capability under $1/M tokens, DeepSeek V4 Pro and Kimi K2.6 are typically the price-performance leaders.

How is the Intelligence score calculated?

The Intelligence score uses the Artificial Analysis Intelligence Index methodology — a composite of GPQA Diamond, AIME 2025, SWE Bench, MMLU Pro, and other public benchmarks normalized to a 0-100 scale. Scores in the high 50s currently mark the frontier (Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro Preview), with most production models sitting in the 30-50 range.

How much does Claude Opus 4.7 cost?

Claude Opus 4.7 lists at $15/M input tokens and $75/M output tokens, blending to approximately $4.10/M at a typical 10K/2K input-output ratio. At 1,000 requests/month with this workload, expect ~$50/month; at 100,000 requests, ~$5,000/month. Use the calculator above to model your specific workload.

Which LLM has the longest context window?

Gemini 3.1 Pro Preview supports up to 10M tokens of context — the longest currently available. Claude Opus 4.7 supports 1M, GPT-5.5 supports 922K, and several open-weight models including Qwen 3.7 Max and DeepSeek V4 Pro support 1M. For most production workloads, 200K-256K is sufficient.

How often is this data updated?

The dataset is verified weekly against vendor pricing pages and the Artificial Analysis leaderboard. New model releases are typically added within 7 days of public availability. The "Last update" badge in the stats band shows the most recent verification date.

What's the difference between input and output token pricing?

Most LLM providers charge separately for input tokens (what you send) and output tokens (what the model generates). Output tokens are typically 3-5× more expensive than input. The "blended" cost is a single normalized figure that approximates the per-token cost at a typical usage ratio — useful for high-level comparison but always verify with the actual two-rate pricing for production budgeting.

What's the difference between Frontier, Advanced, Mainstream, and Efficient tiers?

Tiers reflect Intelligence Index ranges. Frontier (50+) is the current capability ceiling — best for complex reasoning, agentic work, and high-stakes outputs. Advanced (40-49) handles most production work well. Mainstream (30-39) suits routine tasks where cost matters more than peak capability. Efficient (under 30) is for high-volume, low-stakes work where cost dominates.

Is there an API for accessing this data programmatically?

The REST API and MCP server are launching Q3 2026. Join the waitlist above to be notified when they go live. Both will be free for non-commercial use with rate limits, and provide the same dataset surfaced on this page plus historical pricing once the trends archive is built.

How do I choose the right model for my use case?

Start with capability requirements — frontier reasoning needs frontier-tier models; routine classification often runs fine on Efficient tier. Then look at cost at your expected volume — the gap between $0.20/M and $4/M compounds quickly at scale. Finally consider latency and context window for your specific UX. The calculator above ranks by total cost for your scenario; sort the full table by Intelligence/Speed/Latency to optimize for other dimensions.

Spot incorrect pricing or missing model?

Submit a correction or suggest a model to add. We verify every change before publishing — typically within 7 days.