AI Intel: OpenAI Kills Sora, LiteLLM Gets Backdoored, and ARC-AGI-3 Humbles Every Frontier Model

AI Intel Security Benchmarks · March 26, 2026 · 6 min read

OpenAI just killed its most hyped product. Sora, the video generation tool that was supposed to revolutionize creative work, is being shut down barely 15 months after launch. Meanwhile, someone backdoored LiteLLM — the proxy library that half the AI industry depends on — and a new benchmark just proved that frontier models still can't do what any human can. It's been a week.

OpenAI Shuts Down Sora: $17B Burn Rate Claims Another Victim

On Monday, OpenAI announced it's pulling the plug on Sora, its AI video generation app. The Guardian, Variety, CNBC, Ars Technica, and CNN all confirmed the news within hours. This isn't a pivot or a rebrand — it's a full shutdown.

The timing is brutal. Just three months ago, OpenAI signed a three-year deal with Disney giving Sora users access to 200+ licensed characters from Marvel, Pixar, and Star Wars. Disney has now dropped its planned $1 billion investment in OpenAI, with a spokesperson offering the kind of diplomatic statement that translates to "we're disappointed but won't say it."

Why it died: compute costs. Sora 2, which launched in September 2025, could generate stunningly realistic video — but at a cost that didn't make business sense when OpenAI is already burning through $17 billion a year. With an IPO on the horizon, every product needs to justify its GPU allocation. Sora couldn't.

For developers, the lesson is clear: don't build on features that drain your provider's margins. Video generation APIs from Runway, Pika, and Kling are still running, but they're all facing the same economics. If you're integrating video gen into a product, have a fallback plan.

LiteLLM Supply Chain Attack: Your API Keys May Be Compromised

This one should scare you. On March 24, a threat actor called TeamPCP published backdoored versions of the litellm Python package (versions 1.82.7 and 1.82.8) to PyPI. The attack vector was clever: they compromised a Trivy GitHub Action in LiteLLM's CI/CD pipeline, stole PyPI credentials, and uploaded malicious packages directly — bypassing all official build workflows.

The compromised versions included a .pth file (litellm_init.pth) that executes automatically on every Python process startup when litellm is installed. It functioned as both a credential stealer and a dropper, encrypting and exfiltrating API keys, environment variables, and other secrets.

LiteLLM gets three million daily downloads. Sonatype's automated tooling caught it within seconds, but the window of exposure was real. Snyk, JFrog, and Endor Labs all published independent analyses confirming the attack.

What to do right now:

Check if you're running litellm 1.82.7 or 1.82.8: pip show litellm
If yes, rotate every API key in that environment immediately — Anthropic, OpenAI, all of them
Pin your dependencies. Use pip freeze or lockfiles. Stop running pip install --upgrade in production without review
Consider using an API gateway that isolates your provider credentials from your application code — if your app only knows your gateway key, a compromised dependency can't steal your upstream API keys

This is the second major supply chain attack targeting AI infrastructure this year. The AI toolchain is becoming a high-value target precisely because these libraries handle expensive API credentials.

ARC-AGI-3: The Benchmark That Makes GPT-5 Look Like a Toddler

The ARC Prize Foundation launched ARC-AGI-3 yesterday at Y Combinator, and the results are humbling. The best AI agent scored 12.58% during the 30-day preview period. Frontier LLMs — GPT-5, Claude, Gemini, all of them — scored under 1%. Humans score 100%.

ARC-AGI-3 is fundamentally different from previous versions. Instead of static grid puzzles, agents must explore video-game-like environments with no instructions, no stated rules, and no goals. They see a visual state, take an action, observe the result, and have to figure out what they're supposed to do — on the fly, with zero prior exposure.

The benchmark includes hundreds of handcrafted environments with thousands of levels. Scoring measures action efficiency against human baselines collected from 1,200+ players across 3,900+ games. It's not pass/fail — it's "how much do you wander compared to a human who's also seeing this for the first time?"

The toolkit is MIT-licensed, pip-installable (pip install arc-agi), and supports inference from OpenAI, Anthropic, Google, DeepSeek, and others out of the box. There's over $2 million in prize money across three tracks, and all winning solutions must be open-sourced with no external API calls allowed.

This matters because it tests something no current benchmark captures: genuine learning from interaction. Every other benchmark tests recall, pattern matching, or reasoning over provided information. ARC-AGI-3 tests whether an AI can figure out an unfamiliar world by poking at it — the way a child learns. Right now, they can't.

Claude's 1,487% Usage Surge: What's Driving the Migration

Anthropic is having a moment. According to Appfigures data, Claude's daily active users grew 183% between January and March 2026. Session-level data is even more dramatic: usage jumped from 1,100 sessions in mid-January to 17,000 by mid-March — a 1,487% increase in under two months.

The numbers behind the numbers: Anthropic reported a $14 billion annualized revenue run rate in its February 2026 fundraising round. Claude Code alone has a $2.5 billion run rate with over 300,000 business customers. Ramp data shows Anthropic's business software subscriptions grew 4.9% month-over-month in February while OpenAI's share fell 1.5%.

What's driving it? Claude Code. The CLI coding tool hit a nerve with developers who were tired of IDE-integrated AI that felt clunky. Claude Code is fast, opinionated, and works in your terminal. It's also expensive — users on Reddit regularly report burning through their API credits faster than expected, especially with extended thinking enabled.

For anyone running heavy Claude workloads, the cost question is real. Opus 4.6 at $15/$75 per million tokens adds up fast when you're doing multi-file refactoring sessions. API gateways like KissAPI that offer pay-as-you-go access without subscriptions are seeing increased demand from exactly this crowd — developers who want Claude's quality without committing to a $200/month Pro plan they might not fully use.

⚡ Quick Hits

DeepSeek teases new models. The Chinese AI lab has been dropping hints about upcoming releases. No official announcement yet, but the Reddit AI community is watching closely. If DeepSeek-V4 matches the value proposition of V3 (near-frontier quality at a fraction of the cost), it could shake up the API pricing landscape again.
Claude Code stability issues. Multiple Reddit threads report Claude Code hanging, dropping connections, and producing inconsistent outputs during long sessions. Anthropic hasn't acknowledged the issues publicly. If you're hitting these, shorter sessions with explicit checkpoints help.
LM Studio malware concerns. Reports surfaced on Reddit about potential malware in LM Studio downloads. Unconfirmed, but worth checking your download source if you installed it recently. Stick to the official site and verify checksums.

That's your Wednesday briefing. The theme this week: the AI stack is maturing, and with maturity comes real security threats, real product failures, and benchmarks that remind us how far we still have to go. Build carefully.

Access Every Model Through One API

Claude, GPT-5, DeepSeek, Gemini — all through one OpenAI-compatible endpoint. Pay-as-you-go, no subscriptions. Your API keys stay with us, not in your application code.

Get Started Free →