AI Intel: Anthropic’s Mythos Leak + Claude’s Limit Crunch + More

AI Intel Anthropic Infrastructure · March 28, 2026 · 6 min read

Anthropic owned the AI news cycle for the wrong reasons this week. A CMS mistake exposed details of Mythos, an unreleased model the company says is its most capable yet, and almost at the same time Claude users were flooding Reddit over tighter peak-hour limits. If you wanted a snapshot of where the industry is heading, this was it: better models, scarcer compute, less patience.

Meanwhile, the open-source crowd kept doing what it does best: shaving real costs off inference. And outside the model war, the web itself is changing shape as machine traffic starts to overtake humans. Here’s the Saturday AI Intel briefing.

Anthropic leaked Mythos before launch

What happened. Fortune reported that Anthropic accidentally left nearly 3,000 unpublished assets accessible through a public content management cache. The leak exposed draft material for Claude Mythos and references to a new premium tier called Capybara that sits above Opus. Anthropic later confirmed it is testing a new general-purpose model with early access customers and called it “a step change” and the most capable model it has built so far. The leaked draft said the model is especially strong in reasoning, coding, and cybersecurity, and expensive enough that Anthropic is taking a slow rollout approach.

Why it matters. This is more than an embarrassing leak. It tells you Anthropic thinks there is room for a tier above Opus, which means top-end enterprise demand is still strong even after a year of price pressure. It also suggests the next model wave will be sold less as “chat but smarter” and more as “high-end coding and cyber infrastructure.” That is a different market, with different budgets.

Developer angle. Now is a good time to stop hardcoding your stack around today’s top model. A new premium tier means more routing choices, stricter evals, and more pressure to reserve expensive models for tasks that really need them. This is exactly where an OpenAI-compatible layer like KissAPI helps: test Claude, GPT, Gemini, and DeepSeek against the same workflow, then spend premium tokens only where the gain is obvious.

Claude’s session-limit backlash is not just a Reddit tantrum

What happened. Anthropic posted an official update on Reddit saying Claude’s 5-hour session limits now burn faster during peak weekday hours, defined as 5 a.m. to 11 a.m. PT or 1 p.m. to 7 p.m. GMT. Weekly limits are supposedly unchanged, but Anthropic said about 7% of users will hit a limit they would not have hit before. The comments were brutal. Pro and Max users said long coding sessions, background jobs, and even a small number of prompts were chewing through quota much faster than expected.

Why it matters. The issue is not just communication. It is product-market mismatch. Users are treating Claude like work infrastructure, but the service still behaves like a consumer subscription with guardrails. Once people are running agent loops and long coding sessions, “weekly limits are unchanged” is not comforting. What matters is whether the tool disappears halfway through a deadline.

Developer angle. Build as if your favorite model will be rate-limited, slower, or briefly unavailable at the worst moment. Keep tasks chunked, checkpoint long jobs, and maintain a fallback path. The cost side matters too: Claude Opus 4.6 still sits around $15 input and $75 output per million tokens, while Sonnet 4.6 is closer to $3 and $15. Those numbers are fine when you control usage. They are painful when your workflow assumes endless runway.

TurboQuant is the kind of AI progress developers actually feel

What happened. Google’s new TurboQuant research claims it can cut the memory needed to run large language models by as much as 6x through more aggressive KV-cache compression. CNBC reported the news immediately hit memory-chip stocks, with SK Hynix down 6% and Samsung down nearly 5% in the first reaction. Reddit cared about something more useful: implementation. One llama.cpp contributor said skipping roughly 90% of KV dequant work delivered a 22.8% decode speedup at 32K context on Qwen3.5-35B-A3B running on an M5 Max, with no measurable hit to perplexity.

Why it matters. Efficiency changes the market faster than another benchmark flex. If a model gets cheaper to run, more hardware can host it, more teams can afford it, and local inference gets more credible. That matters more to builders than one more abstract leaderboard win.

Developer angle. Watch the inference stack, not just model launches. Improvements in llama.cpp, MLX, and KV-cache compression may save more money this year than switching from one frontier API to another. A fast 35B model with long-context efficiency can be better business than a premium API that looks amazing in screenshots and expensive in production.

Machines are now a first-class user of the internet

What happened. Human Security’s State of AI Traffic report says automated traffic grew almost eight times faster than human activity in 2025. The company says AI-related traffic rose 187% from January to December 2025, while agentic traffic jumped nearly 8,000%. The report draws from more than one quadrillion interactions processed through its platform.

Why it matters. The old web assumed a person was sitting behind every request. That assumption is dying. AI agents now browse docs, fill forms, call APIs, scrape pages, and trigger workflows at meaningful scale. The “dead internet” meme is still exaggerated, but it is no longer detached from reality.

Developer angle. Design for machine traffic on purpose. That means rate limits tied to identity, clearer structured docs, better bot policies, and endpoints that survive automation. Helpful agents and abusive bots are being built from the same underlying ingredients. If your product only works for careful human clicking, it is already aging badly.

Quick Hits

GLM 5.1 was one of the biggest stories on LocalLLaMA, and users are already watching for an open-weights release around April 6 or 7.
The GPT-4o/GPT-5 complaints megathread on r/ChatGPT is sitting at more than 4,000 comments. OpenAI still has the biggest audience, but not the most goodwill.
Anthropic’s temporary 2x off-peak bonus ends after March 28, so expect another round of quota screenshots next week.

The pattern today is hard to miss. AI is getting stronger, but the real winners will be the teams that handle routing, cost, memory, and reliability better than everyone else. Model hype still gets the clicks. Infrastructure is still what decides who ships.

Need one endpoint for fast model switching?

Use KissAPI to test Claude, GPT, Gemini, DeepSeek, and more through one OpenAI-compatible API. Compare quality, latency, and cost without rebuilding your stack every week.

Get Started Free →