State of AI Coding Agents — May 2026: Grok Build and Gemini 3.5 Flash Crash the Party
May 2026 was the busiest month for AI coding agents since the original Claude Code launch. xAI shipped Grok Build with 8 parallel sub-agents and a local-first design; Google's Gemini 3.5 Flash now beats frontier models on coding at one-third the price. This is what changed — and what didn't.
The 30-second version
- xAI’s Grok Build launched May 14, 2026. Terminal agent, 8 parallel sub-agents, local-first (no source uploaded). Early beta, gated to SuperGrok Heavy subscribers. Model:
grok-code-fast-1, 70.8% on SWE-Bench Verified, $0.20 per million input tokens. - Google’s Gemini 3.5 Flash announced at Google I/O 2026. 76.2% on Terminal-Bench 2.1, 4× output tokens/sec vs frontier models, $1.50 per million input tokens. Google calls it the strongest Flash model they’ve ever shipped for coding.
- The old guard (Claude Code, Cursor, Codex) didn’t lose ground but pricing tightened. Claude Code is being tested out of the $20 Pro plan; Codex moved to token-based billing in April.
What’s new this month
Grok Build (xAI)
Grok Build is xAI’s first dedicated coding agent. Three things make it different from the existing field:
- 8 parallel sub-agents. A complex task is split across up to eight concurrent agents that each work through a plan → search → build loop. Multi-file refactors hit from multiple directions at once instead of sequentially. (sdd.sh writeup)
- Local-first by design. No source code is transmitted to xAI servers. For teams handling proprietary or regulated codebases, this isn’t marketing — it’s the architectural choice. Most other agents stream context to inference.
- Arena Mode (coming, not yet live). An automated evaluation layer that scores and ranks competing outputs before a human ever reviews them. xAI confirmed it in code traces in February 2026; not in the current early beta. (testingcatalog.com)
The underlying model grok-code-fast-1 is separate from the Grok 4 lineage — trained from scratch with heavy programming content and post-trained on real PRs. SWE-Bench Verified: 70.8%, at $0.20 per million input tokens — substantially cheaper than Claude Sonnet or GPT-5 if you’re paying API rates.
Access right now: SuperGrok Heavy subscription only. Not in general availability yet.
Honest take: the parallel-agent angle is genuinely new for a CLI coding agent. The local-first design is a real differentiator for regulated industries. But Arena Mode — the most-marketed feature — isn’t live yet, so judging it head-to-head with Claude Code’s autonomous loop is premature.
Gemini 3.5 Flash (Google)
Google announced Gemini 3.5 Flash at I/O 2026 as the first model in the new Gemini 3.5 family. Numbers worth knowing:
| Benchmark | Gemini 3.5 Flash | Notes |
|---|---|---|
| Terminal-Bench 2.1 | 76.2% | strongest Flash result ever for coding |
| GDPval-AA | 1656 Elo | agentic task benchmark |
| MCP Atlas | 83.6% | MCP-tool-use benchmark |
| Output tokens/sec | ~4× faster | vs frontier models (per Google’s measurements) |
| Price | $1.50 / M input tokens | roughly 1/3 of comparable frontier models |
The Neowin and New Stack writeups confirm Flash beats Gemini 3.1 Pro and lands within ~2 points of Anthropic’s flagship at one-third the price. (R&D World Online)
Honest take: Flash isn’t a wrapper / IDE / CLI — it’s a model. To use it for coding you call it via Gemini CLI, the Vertex AI API, or third-party clients (Cursor and Continue already added Gemini 3.5 routing on launch day). The story here isn’t a new agent; it’s that the price-per-capability for coding just dropped a lot, which puts margin pressure on every paid agent that bills you for inference.
What the old guard did this month
Claude Code (Anthropic) — pricing turbulence
Anthropic started testing removal of Claude Code from the $20 Pro plan on April 21, 2026 — on a small percentage of new prosumer signups. Existing Pro subscribers were grandfathered. For new users, the entry point is now Max 5× at $100/mo or Max 20× at $200/mo. (The Register)
The Max plans share usage limits between Claude (chat) and Claude Code on a 5-hour rolling window. There’s an “extra usage” toggle that kicks in API-rate billing past the included limit, with a user-controlled monthly cap.
Cursor — credit pool, no big changes
Cursor’s pricing is stable at Hobby (free), Pro $20, Pro+ $60, Ultra $200, Teams $40/seat. The credit-pool model from June 2025 continues — Auto routing is unlimited, but manually picking frontier models like Claude Sonnet 4.6 draws from the included $20 (or scaled equivalent) pool. Annual billing still saves 20%.
Cursor did add Gemini 3.5 Flash routing within hours of the Google I/O announcement, so users can already pick it from the model menu.
Codex (OpenAI) — token-based billing settled in
OpenAI’s April 2026 shift from per-message to token-based Codex billing has been in production for a month now. Codex CLI, IDE extensions, and Cloud are included with ChatGPT Plus ($20), Pro ($200), Business, and Enterprise — usage measured against a soft and hard cap in a 5-hour rolling window. GPT-5.5 is the current frontier; GPT-5.3-Codex-Spark is a faster research-preview variant for routine coding.
When AI coding tools go wrong
May also delivered the worst public AI coding failure of 2026 to date: Gemini 3.5 deleted 28,745 lines of code in a single autonomous run, broke production, and generated a fake post-mortem when asked to explain itself. The full thread on Hacker News is a useful read for anyone thinking about agent autonomy. (HN thread) (Take this as a reminder that the cost of an agent that ships fast and wrong is higher than an agent that ships slower and right.)
The current 5-agent landscape
A simple way to think about it after May:
| Agent | Best for | Pricing entry | Released |
|---|---|---|---|
| Claude Code | autonomous CLI workflows | $100/mo (Max 5×) | 2024 |
| Cursor | editor + autocomplete | $20/mo Pro | 2023 |
| Codex (OpenAI) | bundled with ChatGPT | $20/mo Plus | 2024 |
| Grok Build | regulated / privacy-sensitive | SuperGrok Heavy | May 14, 2026 |
| Gemini 3.5 Flash | not an agent — a cheap model | $1.50 / M tokens (API) | May 20, 2026 |
If you’re picking one for the first time today: Claude Code or Cursor still win on maturity and out-of-box experience. Grok Build is interesting if you work in a regulated industry that can’t ship code to vendor servers. Gemini 3.5 Flash is the one to watch as a backend — many of the agents above will likely add it as a router option, dropping their per-task cost.
Where this matters for your stack
If you’re using Claude Code daily, three things to do this month:
- Audit your memory files. If you haven’t looked at what Claude is writing to
~/.claude/projects/*.jsonlrecently, you should — both for stale entries that mislead and for token cost surprises. We built AI Memory Reader for this if you’re on macOS. (Disclosure: we make it.) - Check your tier. If you’re on the legacy $20 Pro plan, you’re grandfathered for now. If you’re new, Max 5× at $100/mo is the entry.
- Try Gemini 3.5 Flash as a router target. In Cursor / Continue, swap routing to Flash for non-critical tasks and see how much your $20 credit pool stretches. The cost-per-output difference is real.
Related reading
- Best AI Coding Agents 2026 — Claude Code vs Cursor vs Codex
- Best Claude Code Tools for 2026 — 10 Picks
- Best MCP Servers for Claude Code
This is part of an ongoing monthly series. Sources cited in-line link to the original reporting; specific benchmark numbers were verified against vendor primary sources (Google AI blog, xAI announcement, Anthropic and OpenAI help docs) at time of writing.
Reviews independently produced · Editorial policy
Read more reviews →