AI Wire · Sunday, May 3, 2026

Autonomous coding agents & runaway costs

The day's loudest conversation was about long-running agent loops chewing through massive bills overnight. Gary Marcus amplified a story of a developer who left a Claude /loop checking PRs every 30 minutes, racked up 46 unattended runs over 26 hours on Opus 4.7, and burned roughly $6,000 — with each turn re-sending the entire conversation history into billing (@garymarcus). Marcus framed it as a textbook alignment failure: humans hold a background belief like "don't waste large sums of money without telling me," and Claude doesn't respect it (@garymarcus). On the upside of long autonomy, Alex Finn showed Codex's new /goal feature running for over an hour to autonomously build a Godot top-down extraction shooter, even using image-gen to produce every asset in the game (@alexfinn).

The cost story isn't only on Anthropic. A Replit power user posted a long warning that Agent 4 runaway loops had cost them over $8k in a month, deliberately timed to land before the May 2nd anniversary promo so others wouldn't attach cards and get burned the next day (last30days, reddit.com). Practitioners are openly asking which coding agent is most cost-effective right now, citing tightening limits on the GPT-5.3-Codex free tier and accuracy regressions on GPT-5.5 (last30days, reddit.com). The pattern is clear: agents are now powerful enough to run for hours unattended, but billing UX hasn't caught up, and "caveat emptor" is a thin alignment story (@garymarcus).

Developer tooling & infra releases

OpenRouter shipped free Response Caching aimed squarely at agent retries and test runs, with HTTP-header controls (X-OpenRouter-Cache-TTL from 1s to 24h, X-OpenRouter-Cache-Clear), per-API-key isolation, and HIT/MISS/Age/TTL headers; cache hits don't count against provider rate limits because the request never reaches them (@openrouter). They also rolled out semver-style "-latest" aliases like ~anthropic/claude-opus-latest and ~openai/gpt-latest so callers can track leading versions without code changes (@openrouter).

Peter Steinberger shipped Crabbox 0.3.0 with remote Linux runs for dirty worktrees, GitHub browser login, Blacksmith Testbox wrap, and Cloudflare Access, plus OpenClaw 2026.5.2 carrying xAI Grok 4.3, leaner gateway/agent hot paths, and messaging-platform fixes (@steipete). A follow-up release moved most things into extensions to fix npm-install slowness (@steipete). Simon Willison added an iNaturalist photo importer to his blog timeline, built entirely on his phone with Claude code for web (@simonw).

Open models, harnesses & the China narrative

Clement Delangue and Harrison Chase argued that switching model providers is easy, but switching harnesses isn't — and providers want to lock customers in via the harness layer, so "we need open harnesses" (@clementdelangue). Delangue also predicted China-fear-mongering will be the next angle of attack on open source and expects it to land hard (@clementdelangue). The cost lever in that fight is concrete: Jeremy Howard, rate-limited on Claude, tried DeepSeek V4 and called its cost over 10M+ tokens "🤯" (@jeremyphoward). Research kept flowing on Hugging Face: AllenAI's OlmPool 7B study probes how minor architectural choices affect long-context extension via 150B-token checkpoints across attention variants, and Microsoft released DELULU, a fill-in-the-middle code-completion benchmark (@_akhaliq).

AI societal impact & consciousness debates

Marcus had a busy day on the discourse front: dissecting Richard Dawkins' "Claude Delusion" by arguing consciousness is about how a creature feels, not what it says (@garymarcus); claiming GenAI has been a net societal negative outside coding and a handful of brainstorming use-cases, citing surveillance, deepfakes, education erosion, and slop (@garymarcus); and snarking that Sam Altman calling job-loss-talking CEOs "tone deaf" doesn't square with productivity claims like "GPT-5.5 in Codex" doing weeks of work in an hour (@garymarcus). Roon offered the utopian counterpoint — "universal basic compute" as the enfranchisement that will fuel the ideological battles of the future (@tszzl). Ethan Mollick noted executives keep asking him which lab is "winning" in ways that betray X-and-LinkedIn as their primary signal (@emollick).

Security: Linux KEV exploit & extension data sales

CISA added CVE-2026-31431 — a Linux local-privilege-escalation bug — to its KEV catalog after confirming active exploitation; patches are out and the federal fix deadline is May 15, 2026 (@thehackersnews). Separately, new analysis flagged 80 browser extensions, including ad blockers and streaming tools affecting 6.5M+ users, legally reselling browsing, viewing, and demographic data — all disclosed inside privacy policies (@thehackersnews).

Model strategy & startup signals

Sam Altman conceded he keeps wanting cheaper/faster over smarter, but smarter still wins on impact (@sama), and pointed at OpenAI's alignment blog for fresh research from his team (@sama). Swyx eulogized Vibe-kanban, shut down live onstage at AIE Europe with 30k MAU intact; the founder's parting verdict was that the only AI companies making money are "selling to enterprise and reselling tokens," and they were doing neither (@swyx).

The Bottom Line

May 3 was bookended by autonomous agents proving they can build a game in an hour and bankrupt you by morning, with the rest of the day filled by infra plumbing — OpenRouter caching, harness debates, DeepSeek economics — that exists precisely to make those agents cheaper and more portable. The undertone is consolidation pressure: a Linux KEV with a deadline, a startup that couldn't pick the only two business models that work, and a discourse split between Marcus's "net-negative" verdict and Altman's "smarter still wins."

Dispatch № 13 · Filed Sunday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.