AI Intel: Anthropic Closes the Claude Buffet + Mythos Looms + Open Models Keep Coming

AI Intel Anthropic Pricing Open Models · April 5, 2026 · 6 min read

The biggest Reddit story this weekend was not a benchmark chart. It was Anthropic shutting down the cheap trick many power users had leaned on: using Claude Pro or Max subscription limits inside third-party agentic tools. That move says a lot about where the market is heading. Frontier labs are tightening access, token costs are back in focus, and open models are getting good enough that you can no longer treat them like a side project.

Anthropic just ended the “unlimited Claude” loophole

What happened: Anthropic said that starting April 4 at noon Pacific, Claude Pro subscribers paying $20 per month and Max subscribers paying $100 to $200 per month can no longer use their subscription limits inside third-party harnesses such as OpenClaw. If you still want Claude in those tools, you now need API billing or Anthropic’s new extra-usage bundles. To soften the hit, Anthropic offered a one-time credit equal to the monthly plan price through April 17 and up to 30% off prepaid extra-usage bundles.

Why it matters: The all-you-can-eat phase of coding agents was always a pricing glitch. Agents do not behave like ordinary chat users. They retry, branch, diff files, summarize context, call tools, then do it again. One heavy coding session can burn through more compute than a week of casual chatting. Anthropic finally priced for that reality. Reddit understood the subtext immediately: if usage looks like API traffic, the lab wants API economics.

Developer angle: If your workflow depended on subscription arbitrage, reprice it now. Put hard caps on autonomous loops, use prompt caching where your stack supports it, and keep a fallback model ready. This is also why multi-model routing matters more now than it did a month ago. If you want Claude, GPT, and Gemini behind one OpenAI-compatible endpoint, services like KissAPI start to make more sense when the fallback story is no longer optional.

The Mythos leak still hangs over Anthropic

What happened: Reddit’s weekly Anthropic threads are still obsessed with Mythos. The leak trail from late March pointed to a model tier above Opus, reportedly tied to the Capybara codename and already in limited partner testing. Anthropic has not done a full public launch, but the leaked material was clear enough that nobody serious thinks Mythos is fiction anymore.

Why it matters: This is the other half of the cutoff story. Anthropic is tightening current capacity while a heavier model sits behind the curtain. That tells you the frontier game is splitting in two: premium first-party experiences on one side, metered API spend on the other. The middle ground — cheap unlimited access through wrappers and clever hacks — is disappearing fast.

Developer angle: Do not build your roadmap around a rumored model. Build for swappability instead. Provider abstraction, model-specific evals, and clean fallbacks matter more than guessing Mythos launch day. The team that wins this quarter will not be the one that waited for Mythos. It will be the one that can swap Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro, or a Qwen/Gemma fallback without breaking product logic.

GPT-5.4 vs Opus 4.6 is really a pricing story now

What happened: Reddit still cannot agree on a single winner between GPT-5.4 and Claude Opus 4.6, and that may be the clearest signal of all. GPT-5.4 is still the expensive bet at about $30 per million input tokens and $180 per million output tokens. Claude Opus 4.6 sits closer to $15 and $75. Gemini 3.1 Pro keeps blowing up the comparison threads because it is priced around $2 input and $12 output. OpenAI’s new Codex rate card made the economics even harder to ignore: GPT-5.4 is listed at 62.5 credits per 1M input tokens and 375 credits per 1M output tokens, and OpenAI says average Codex usage lands around $100 to $200 per developer each month.

Why it matters: The old question was “which model is smartest?” The new question is “which part of my stack deserves the expensive model?” Agent workflows magnify every pricing mistake. A planner call, retrieval pass, test rerun, review step, and fix pass can turn one task into five bills. That is why token anxiety has moved from finance Slack channels into day-to-day product design.

Developer angle: Route cheaper models to summarization, extraction, routing, and boilerplate. Save Opus or GPT-5.4 for the final hard step. Budget for output tokens, not just prompt size, because agents talk back a lot more than people expect. If you are comparing providers this quarter, compare total workflow cost, not leaderboard screenshots.

Open models are no longer the backup plan

What happened: Open-model momentum kept building this week. Google released Gemma 4 as an Apache 2.0 family aimed at reasoning and agentic workflows, while Alibaba’s Qwen 3.6-Plus kept showing up in Reddit comparisons thanks to its 1M-token context window and strong coding chatter. The mood has changed. Open models are not being discussed like hobby toys anymore.

Why it matters: Closed labs are still ahead at the very top, but the gap below the frontier keeps shrinking. Every time an open or low-cost model becomes good enough for 60% or 70% of a pipeline, it drags pricing pressure back onto the big labs. That is one reason this market feels jumpy right now: the smart-money move is often not “pick one winner,” but “split the workload.”

Developer angle: Use open models for glue work: classification, reranking, simple tool calls, log triage, maybe even first-draft code. Keep the frontier model for the last expensive mile. That mix is how you stay sane when token usage turns into an ops problem instead of a budget note at the end of the month.

Quick Hits

Claude Code’s source-map leak is still being dissected. Once 500,000-plus lines of a coding agent land on the internet, they do not disappear. Expect more copycat ideas and more enterprise security questions this week.
OpenAI’s $122 billion funding round changes the pricing backdrop. A company sitting on that much capital has room to subsidize market share fights longer than smaller rivals would like.
Meta is already back in rumor season. After Llama 4 landed with mixed reactions, Reddit is treating every hint of a follow-up model like a live fire drill.

Need one endpoint for Claude, GPT, Gemini, and more?

KissAPI gives you OpenAI-compatible access to multiple top models so you can route by cost, latency, and task difficulty instead of betting your whole app on one vendor.

Try KissAPI Free →