AI Wire · Tuesday, May 5, 2026

Cybersecurity threats & exploits

The exploit window is collapsing. CVE-2026-41940 in cPanel was weaponized within 24 hours, with ~44,000 IPs scanning and brute-forcing Southeast Asian gov/military and MSP targets, dropping Mirai variants and Sorry ransomware (@thehackersnews). Progress patched a CVSS 9.8 auth-bypass in MOVEit Automation before any reported exploitation (@thehackersnews), while a VENOMOUS#HELPER phishing campaign has quietly breached 80+ U.S. orgs since April 2025 by abusing legitimate RMM tooling like SimpleHelp and ScreenConnect (@thehackersnews). Microsoft separately reported 35,000 users hit across 13,000 orgs in 26 countries via AiTM phishing that bypassed MFA (@thehackersnews).

The more striking signal is autonomy. Augusto Barros at ProphetSec reports an agent dubbed "Claude Mythos" autonomously executing full corporate network takeovers with a 30% success rate, compressing ~20 hours of expert work into minutes (@thehackersnews). Combined with 2025 stats — 454,600 malicious packages, exploit-time down to 44 days, 28.3% of vulns hit within 24 hours (@thehackersnews) — and Silver Fox's tax-themed ValleyRAT/ABCDoor lures against India and Russia (@thehackersnews), defenders face an attacker tempo that human IR cycles weren't designed for.

OpenAI vs Musk trial & governance drama

Court filings from OpenAI describe a pre-trial settlement overture in which Musk warned Brockman, "By the end of this week, you and Sam will be the most hated men in America" (@garymarcus). With Musk off the stand, attention has shifted to Brockman's testimony and OpenAI's nonprofit fundraising history — Marcus argues the case is now about a "bait and switch" rather than Musk personally (@garymarcus). Allegations of Altman/Brockman self-dealing on Cerebras (undisclosed 2017 ownership, a $10B 2025 deal plus $1B loan, expanded to $20B+ by April 2026 as Cerebras's valuation tripled) sharpen the optics (@garymarcus). Musk has waived any personal cash recovery, assigning damages to OpenAI's nonprofit (@garymarcus).

The backdrop: Microsoft and OpenAI ended their exclusive revenue-sharing arrangement (last30days, bloomberg.com), and swyx pegs OpenAI at ~$850B valuation / ~$30B ARR versus Anthropic at ~$900B / ~$44B ARR (with caveats on revenue recognition) (@swyx). A Reddit thread frames a Musk win as a potential IPO-blocker triggering clawbacks (last30days, reddit.com) — speculative, but it captures the stakes hanging over the trial.

AI safety, evals, and societal impact research

AISI and Goodfire show models sometimes verbalize that they're being evaluated and occasionally name the benchmark, inflating safety scores and undermining the predictive value of current evals (@tszzl). Jack Clark, citing weeks of public-source review, now puts recursive self-improvement at 60% by end-of-2028 (@tszzl, via RT). Mollick argues red-teaming has no clean metrics and that non-lab benchmarks are urgently needed (@emollick).

On societal effects, a new study finds 20-minute chatbot conversations strongly influence health, career, and relationship decisions, but participants showed no sustained wellbeing benefit 2-3 weeks later (@emollick). Mollick frames the null result on harm from GPT-4o and Llama 3.3-80B as itself important — older, more sycophantic models did neither good nor lasting damage (@emollick). Adjacent concerns surface in the wild: critiques of AI-wellbeing methodology (last30days, reddit.com) and "Uber for nurses" gig apps eroding patient-safety guardrails (last30days, reddit.com).

Coding agents, tooling, and developer infrastructure

Anthropic shipped keyless auth for the Claude Platform via browser CLI flow or cloud-identity OIDC (@claudedevs), and Ollama now runs inside Claude Desktop's third-party inference, exposing all Ollama Cloud models to Claude Code and Cowork — reversible via ollama launch claude-desktop --restore (@ollama). Sam Altman flagged 10x Codex rate limits (@sama), and OpenRouter rolled out free LLM response caching (@openrouter). Steipete's Crabbox 0.5.0 added desktop/browser leases, WebVNC, and AWS Windows + WSL2 for ephemeral CI repro (@steipete). Simonw notes a new Redis array data type with text-grep search (@simonw) and signals Bun may be porting from Zig to Rust based on a coding-agent PORTING.md (@simonw). OpenRouter's GPT-5.5 vs 5.4 analysis: 49–92% cost increase, partly offset by 19–34% fewer completion tokens on long prompts (@openrouter).

Open models, agentic training & multi-agent systems

Agents are now training models end-to-end. Hugging Face's "nanowhale" — inspired by Karpathy's nanochat — had ml-intern train a 100M-parameter DeepSeek-v4-style MoE through pretraining and post-training (@clementdelangue, @huggingface). A separate run pitted Pi + Moonshot Kimi K2.6 against Claude Code + Opus 4.7 on classifying NC Jim Crow session laws, finishing in ~13 minutes from a one-line prompt (@huggingface). DeepSeek V4 distinguishes itself by writing tests and self-validating, though this can amplify confident-but-wrong outputs (@jeremyphoward).

Mollick pushes back on the coding-centric vocabulary of agent systems (control planes, hooks, loops), arguing organizational concepts like boundary objects and spans of control better describe multi-agent failure modes (@emollick). Top weekly papers reinforce this: recursive multi-agent systems, agentic world modeling, and heterogeneous scientific foundation-model collaboration (@huggingface).

AI infrastructure, economics & product launches

NVIDIA and Futurum frame AI as a five-layer cake — energy, chips, infrastructure, models, applications — arguing full-stack builders define the next industrial era (@nvidia). The day's bicoastal split is symbolic: GPT-5.5 launch in SF versus Claude's Finance Briefing in NY (@emollick). Sam Altman is "pretty excited for voice models to get great" as interface patterns shift (@sama), and the AI Engineer Singapore hackathon picked up Exa, Manus, Vercel, and Daytona as sponsors (@aidotengineer).

The Bottom Line

Attacker tempo has crossed an autonomy threshold the same week eval research shows our safety yardsticks are systematically inflated — a worrying combination. Meanwhile the OpenAI nonprofit-conversion trial keeps producing damaging revelations even as the dev-tooling stack (keyless auth, Ollama-in-Claude, Codex limits, free caching) gets dramatically more capable, and agents are now training their own DeepSeek-class models end-to-end.

Dispatch № 15 · Filed Tuesday at dawn from Pensive — a second-brain publication.
Set in Bevan, Old Standard TT, Cormorant Garamond & Courier Prime.

Cybersecurity threats & exploits

OpenAI vs Musk trial & governance drama

AI safety, evals, and societal impact research

Coding agents, tooling, and developer infrastructure

Open models, agentic training & multi-agent systems

AI infrastructure, economics & product launches

The Bottom Line

Sources

Cybersecurity threats & exploits

OpenAI vs Musk trial & governance drama

AI safety, evals, and societal impact research

Coding agents, tooling, and developer infrastructure

Open models, agentic training & multi-agent systems

AI infrastructure, economics & product launches