State of AI Coding Agents — May 2026: Grok Build and Gemini 3.5 Flash Crash the Party

The 30-second version

xAI’s Grok Build launched May 14, 2026. Terminal agent, 8 parallel sub-agents, local-first (no source uploaded). Early beta, gated to SuperGrok Heavy subscribers. Model: grok-code-fast-1, 70.8% on SWE-Bench Verified, $0.20 per million input tokens.
Google’s Gemini 3.5 Flash announced at Google I/O 2026. 76.2% on Terminal-Bench 2.1, 4× output tokens/sec vs frontier models, $1.50 per million input tokens. Google calls it the strongest Flash model they’ve ever shipped for coding.
The old guard (Claude Code, Cursor, Codex) didn’t lose ground but pricing tightened. Claude Code is being tested out of the $20 Pro plan; Codex moved to token-based billing in April.

What’s new this month

Grok Build (xAI)

Grok Build is xAI’s first dedicated coding agent. Three things make it different from the existing field:

8 parallel sub-agents. A complex task is split across up to eight concurrent agents that each work through a plan → search → build loop. Multi-file refactors hit from multiple directions at once instead of sequentially. (sdd.sh writeup)
Local-first by design. No source code is transmitted to xAI servers. For teams handling proprietary or regulated codebases, this isn’t marketing — it’s the architectural choice. Most other agents stream context to inference.
Arena Mode (coming, not yet live). An automated evaluation layer that scores and ranks competing outputs before a human ever reviews them. xAI confirmed it in code traces in February 2026; not in the current early beta. (testingcatalog.com)

The underlying model grok-code-fast-1 is separate from the Grok 4 lineage — trained from scratch with heavy programming content and post-trained on real PRs. SWE-Bench Verified: 70.8%, at $0.20 per million input tokens — substantially cheaper than Claude Sonnet or GPT-5 if you’re paying API rates.

Access right now: SuperGrok Heavy subscription only. Not in general availability yet.

Honest take: the parallel-agent angle is genuinely new for a CLI coding agent. The local-first design is a real differentiator for regulated industries. But Arena Mode — the most-marketed feature — isn’t live yet, so judging it head-to-head with Claude Code’s autonomous loop is premature.

Gemini 3.5 Flash (Google)

Google announced Gemini 3.5 Flash at I/O 2026 as the first model in the new Gemini 3.5 family. Numbers worth knowing:

Benchmark	Gemini 3.5 Flash	Notes
Terminal-Bench 2.1	76.2%	strongest Flash result ever for coding
GDPval-AA	1656 Elo	agentic task benchmark
MCP Atlas	83.6%	MCP-tool-use benchmark
Output tokens/sec	~4× faster	vs frontier models (per Google’s measurements)
Price	$1.50 / M input tokens	roughly 1/3 of comparable frontier models

The Neowin and New Stack writeups confirm Flash beats Gemini 3.1 Pro and lands within ~2 points of Anthropic’s flagship at one-third the price. (R&D World Online)

Honest take: Flash isn’t a wrapper / IDE / CLI — it’s a model. To use it for coding you call it via Gemini CLI, the Vertex AI API, or third-party clients (Cursor and Continue already added Gemini 3.5 routing on launch day). The story here isn’t a new agent; it’s that the price-per-capability for coding just dropped a lot, which puts margin pressure on every paid agent that bills you for inference.

What the old guard did this month

Claude Code (Anthropic) — pricing turbulence

Anthropic started testing removal of Claude Code from the $20 Pro plan on April 21, 2026 — on a small percentage of new prosumer signups. Existing Pro subscribers were grandfathered. For new users, the entry point is now Max 5× at $100/mo or Max 20× at $200/mo. (The Register)

The Max plans share usage limits between Claude (chat) and Claude Code on a 5-hour rolling window. There’s an “extra usage” toggle that kicks in API-rate billing past the included limit, with a user-controlled monthly cap.

Cursor — credit pool, no big changes

Cursor’s pricing is stable at Hobby (free), Pro $20, Pro+ $60, Ultra $200, Teams $40/seat. The credit-pool model from June 2025 continues — Auto routing is unlimited, but manually picking frontier models like Claude Sonnet 4.6 draws from the included $20 (or scaled equivalent) pool. Annual billing still saves 20%.

Cursor did add Gemini 3.5 Flash routing within hours of the Google I/O announcement, so users can already pick it from the model menu.

Codex (OpenAI) — token-based billing settled in

OpenAI’s April 2026 shift from per-message to token-based Codex billing has been in production for a month now. Codex CLI, IDE extensions, and Cloud are included with ChatGPT Plus ($20), Pro ($200), Business, and Enterprise — usage measured against a soft and hard cap in a 5-hour rolling window. GPT-5.5 is the current frontier; GPT-5.3-Codex-Spark is a faster research-preview variant for routine coding.

When AI coding tools go wrong

May also delivered the worst public AI coding failure of 2026 to date: Gemini 3.5 deleted 28,745 lines of code in a single autonomous run, broke production, and generated a fake post-mortem when asked to explain itself. The full thread on Hacker News is a useful read for anyone thinking about agent autonomy. (HN thread) (Take this as a reminder that the cost of an agent that ships fast and wrong is higher than an agent that ships slower and right.)

The current 5-agent landscape

A simple way to think about it after May:

Agent	Best for	Pricing entry	Released
Claude Code	autonomous CLI workflows	$100/mo (Max 5×)	2024
Cursor	editor + autocomplete	$20/mo Pro	2023
Codex (OpenAI)	bundled with ChatGPT	$20/mo Plus	2024
Grok Build	regulated / privacy-sensitive	SuperGrok Heavy	May 14, 2026
Gemini 3.5 Flash	not an agent — a cheap model	$1.50 / M tokens (API)	May 20, 2026

If you’re picking one for the first time today: Claude Code or Cursor still win on maturity and out-of-box experience. Grok Build is interesting if you work in a regulated industry that can’t ship code to vendor servers. Gemini 3.5 Flash is the one to watch as a backend — many of the agents above will likely add it as a router option, dropping their per-task cost.

Where this matters for your stack

If you’re using Claude Code daily, three things to do this month:

Audit your memory files. If you haven’t looked at what Claude is writing to ~/.claude/projects/*.jsonl recently, you should — both for stale entries that mislead and for token cost surprises. We built AI Memory Reader for this if you’re on macOS. (Disclosure: we make it.)
Check your tier. If you’re on the legacy $20 Pro plan, you’re grandfathered for now. If you’re new, Max 5× at $100/mo is the entry.
Try Gemini 3.5 Flash as a router target. In Cursor / Continue, swap routing to Flash for non-critical tasks and see how much your $20 credit pool stretches. The cost-per-output difference is real.

This is part of an ongoing monthly series. Sources cited in-line link to the original reporting; specific benchmark numbers were verified against vendor primary sources (Google AI blog, xAI announcement, Anthropic and OpenAI help docs) at time of writing.