AI Intel: Mistral Small 4 Goes Full Apache 2, Anthropic Fights the Pentagon, NVIDIA Drops Nemotron 3

AI Intel Open Source GTC Week Published March 17, 2026 · 6 min read

Mistral just dropped a 119-billion-parameter model under Apache 2.0 — the most capable fully open model we've seen this year. Meanwhile, Anthropic is in a legal war with the Pentagon, NVIDIA launched an entire model family at GTC, and OpenAI quietly opened the Sora 2 API to everyone. It's been a week.

Mistral Small 4: 119B Parameters, Apache 2.0, Zero Strings Attached

Mistral released Small 4 yesterday, and the name is misleading. This is a 119B-parameter Mixture-of-Experts model with only 6B active parameters per forward pass. It ships under Apache 2.0 — meaning you can run it, fine-tune it, and deploy it commercially without asking anyone's permission.

The model unifies instruct, reasoning, and multimodal workloads into a single checkpoint. It supports vision input, function calling, and extended reasoning — all in one model you can self-host with vLLM. Multiple checkpoint variants are already on Hugging Face.

Why this matters: the gap between proprietary and open models keeps shrinking. A year ago, you needed GPT-4 or Claude for anything serious. Now a 6B-active-parameter model running on a single GPU can handle most production workloads. For developers building AI products, this changes the cost equation entirely. Instead of paying per-token to an API provider, you can run inference on your own hardware at a fraction of the cost for high-volume use cases.

The timing is also notable — Mistral announced a partnership with NVIDIA alongside this release, signaling that open-source AI and hardware companies are aligning against the closed-model incumbents.

Anthropic vs. The Pentagon: AI's First Constitutional Crisis

The Anthropic situation has escalated from "concerning" to "unprecedented." Here's the timeline: Defense Secretary Pete Hegseth's Department of War labeled Anthropic a "supply chain risk," effectively blacklisting them from defense contracts and pressuring other companies to cut ties. Anthropic sued. Tech giants and former military leaders filed supporting briefs. And now, according to WIRED, President Trump is finalizing an executive order to formally ban Anthropic tools across the entire federal government.

Anthropic's court filings claim this could cost them billions in 2026 revenue. The irony is thick — Anthropic already operates Claude Gov, a version of Claude with relaxed safety restrictions specifically for military use. The government has reportedly been using Claude for target selection in its bombing campaign against Iran. Anthropic's objection isn't about military use; it's about the terms under which that use happens.

For developers, the practical impact is real. If you're building on Claude and your customers include government agencies or government contractors, you're now in a gray zone. The smart move is multi-model architecture — don't bet your product on a single provider when geopolitics can pull the rug overnight. API gateways that let you swap between Claude, GPT-5, and open models with a single parameter change aren't just convenient anymore; they're risk management. If you're exploring alternatives for exactly this reason, services like KissAPI let you route between providers through one OpenAI-compatible endpoint.

NVIDIA Nemotron 3: Built for the Agent Era

NVIDIA used GTC week to launch Nemotron 3 Super, and the architecture is genuinely interesting. It's a hybrid Mamba-Transformer MoE model — combining three different architectural innovations to deliver 5x higher throughput and 2x higher accuracy compared to the previous Nemotron Super.

The key innovation is "Latent MoE" — a technique that compresses tokens before they reach the expert layers, allowing the model to consult 4x as many specialist experts for the same inference cost. It runs in NVFP4 precision on Blackwell GPUs, cutting memory requirements and speeding up inference by up to 4x compared to Hopper.

NVIDIA is positioning this explicitly for agentic AI — systems where models need to reason, plan, use tools, and maintain state across long interactions. The model is open, which means you can deploy it on your own NVIDIA hardware without licensing fees.

The bigger picture: NVIDIA isn't just selling GPUs anymore. They're building the full stack — hardware, models, and frameworks — for the agent economy. AWS announced today they're adding over 1 million NVIDIA GPUs (Blackwell and Rubin architectures) across their global regions starting in 2026. The infrastructure for running these models at scale is being built right now.

Sora 2 API Opens to All Developers

OpenAI officially retired Sora 1 on March 13 and opened the Sora 2 video generation API to all developers. The new version supports up to 20-second video generation and is being integrated directly into ChatGPT.

The API pricing hasn't been fully disclosed yet, but early reports suggest it's expensive — video generation burns through compute at a rate that makes LLM inference look cheap. OpenAI has projected it could spend over $225 billion on inference between 2026 and 2030, and video generation is a big part of that bill.

For most developers, Sora 2 is a "watch and wait" story. The creative applications are obvious — marketing content, product demos, social media — but the cost-per-video needs to come down before it makes sense for anything beyond high-value production work. Netflix acquiring an AI filmmaking company the same week tells you where the industry thinks this is heading.

⚡ Quick Hits

GPT-5.4 computer use is real. Released March 5, it's OpenAI's first model with built-in computer use capabilities — 1M token context, native tool search, and the ability to interrupt and steer its own reasoning. OpenAI and Microsoft are calling it a "production-grade agent" model. GPT-5.1 has been retired and auto-migrated to 5.3/5.4.
API pricing keeps falling. The Reddit community is tracking a clear trend: per-token costs across all major providers have dropped 60-80% in the past 12 months. Claude subscriptions are now 36x more expensive per token than API access for heavy users. If you're still on a $20/month subscription and using the API would be cheaper, it's time to do the math.
QuitGPT movement hits 2.5M. The Cancel ChatGPT movement that started on Reddit has grown to 2.5 million participants. The reasons are mixed — pricing frustration, privacy concerns, and the growing availability of competitive alternatives. OpenAI's response has been to accelerate feature releases (Sora integration, computer use) to justify the subscription price.

Access Every Model Through One API

Claude, GPT-5.4, Mistral, and more — all through one OpenAI-compatible endpoint. Pay-as-you-go, no subscriptions. Switch models with a single parameter.

Try KissAPI Free →