The Gradient Descent

The Future Of AI, One Step At A Time
Listen To The Podcast!
Vol. 2, No. 29 SUNDAY, June 21 2026 Cost: 96GB

HEADLINES

Nobel Laureate John Jumper Leaves Google DeepMind For Rival Anthropic

John Jumper, who shared the 2024 Nobel Prize in Chemistry for developing the AlphaFold AI model, announced he is leaving Google DeepMind after nearly 9 years to join rival startup Anthropic. This marks a major talent escalation in the AI wars, as Anthropic poaches a Nobel-winning researcher from Google's crown jewel AI lab — following Character AI co-founder Noam Shazeer also departing DeepMind for OpenAI this week.

Continued on Page 2 >> — Ronnie Cache & Chip Carter

SpaceX To Acquire Cursor For $60B In Stock Days After Blockbuster IPO

SpaceX is acquiring AI coding assistant Cursor for $60 billion in stock, just days after Cursor's blockbuster IPO. The deal signals SpaceX's bet on AI-powered software development and represents one of the largest AI acquisitions in history.

Continued on Page 4 >> — Ronnie Cache

ChatGPT's Market Share Slips Below 50% For The First Time

OpenAI's ChatGPT commanded over 50% market share until January but fell to 46.4% by May as Google's Gemini (27.7%) and Anthropic's Claude (10.3%) surged. Users are increasingly switching between AI assistants, and Claude leads in subscription conversion rates at 13%.

Continued on Page 5 >> — Ronnie Cache & Chip Carter

Weibo's Tiny VibeThinker-3B Matches Or Exceeds Frontier AI Models 100x Its Size

Chinese social media giant Weibo released a 3-billion-parameter model, VibeThinker-3B, that matches or exceeds the reasoning performance of flagship systems from Google DeepMind, OpenAI, and Anthropic that are hundreds of times larger. The paper sent shockwaves through the AI research community and reignited debates about benchmark validity.

Continued on Page 6 >> — Ronnie Cache

Satya Nadella Warns AI Could 'Hollow Out' Entire Industries

Microsoft CEO Satya Nadella published a sweeping essay warning that frontier AI models risk absorbing the expertise of entire industries and commoditizing it, leaving businesses stripped of their competitive moats — echoing the damage globalization caused to manufacturing.

Continued on Page 7 >> — Ronnie Cache

AI Inference Startup Baseten Reportedly Raising $1.5B Months After Its Last Mega-Round

AI inference startup Baseten is reportedly raising $1.5 billion in a new funding round, just months after its last mega-round. The massive raise underscores the voracious demand for AI inference infrastructure as enterprises race to deploy AI at scale.

Continued on Page 8 >> — Ronnie Cache

Norway Imposes Near-Ban On AI Use In Elementary Schools

Norway announced sweeping restrictions on AI use in schools: pupils aged 6-13 should generally not use AI, ages 14-16 may cautiously use tools under teacher supervision, and ages 17-19 should learn appropriate AI use. One of the strictest national AI-in-education policies yet.

Continued on Page 9 >> — Ronnie Cache

Amazon Hopes To Challenge Nvidia More Directly By Selling Its AI Chips

AWS is in talks to sell its Trainium AI chips to third-party data centers, representing a potential $50 billion business. CEO Andy Jassy said chip capacity is selling out faster than Amazon can produce them. This would be one of the biggest challenges to Nvidia's AI chip dominance yet.

Continued on Page 10 >> — Chip Carter

SCIENTIFIC PAPERS

LedgerAgent: Structured State For Policy-Adherent Tool-Calling Agents

LedgerAgent introduces a separate ledger for maintaining observed task states in tool-calling agents, rendering them into the prompt and using them to check state-dependent policy constraints before executing environment-changing tool calls. Across four customer-service domains, LedgerAgent improves pass@k over standard prompt-based approaches.

Continued on Page 11 >> — Paula Rization

Toward Calibrated Mixture-of-Experts Under Distribution Shift

This paper studies how MoE models behave under distribution shift, focusing on how routing mechanisms interact with expert-level calibration. The authors propose adversarial reweighting that penalizes calibration errors under distribution shift, improving the accuracy-calibration tradeoff.

Continued on Page 12 >> — Paula Rization

How Transparent Is DiffusionGemma?

While DiffusionGemma initially appears to have 28.6X higher opaque serial depth than autoregressive Gemma 4, the authors show that information flowing between denoising steps can be mapped through an interpretable token bottleneck, reducing opaque serial depth to just 1.1X. They uncover novel diffusion-specific phenomena like non-chronological reasoning, token smearing, and intermediate-context reasoning.

Continued on Page 13 >> — Paula Rization

ENPIRE: Agentic Robot Policy Self-Improvement In The Real World

ENPIRE is a harness framework for coding agents that instantiates a physical feedback loop for real-world robot policy improvement. Powered by ENPIRE, coding agents autonomously train policies to achieve 99% success rates on challenging dexterous manipulation tasks like organizing a pin box, fastening a zip tie, and tool use.

Continued on Page 14 >> — Paula Rization

UltraQuant: 4-bit KV Caching For Context-Heavy Agents

UltraQuant studies 4-bit KV-cache compression for context-heavy agent workloads where long prefixes are reused across many short turns. On AMD GPUs, UltraQuant cuts P50 time-to-first-token by 3.47x in cache-pressured late rounds and raises output throughput by 1.63x over the FP8 KV baseline.

Continued on Page 15 >> — Paula Rization

Rethinking Shrinkage Bias In LLM FP4 Pretraining

This paper identifies Shrinkage Bias—a systematic negative rounding error caused by geometric asymmetry of representable bins in non-uniform FP4 formats. The authors propose UFP4, a uniform 4-bit training recipe using E1M2/INT4 grids, achieving lower BF16-relative loss degradation on Dense 1.5B, MoE 7.9B, and MoE 124B long-run pretraining.

Continued on Page 16 >> — Paula Rization

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization For Deep Research

ScaffoldAgent introduces utility-guided dynamic outline optimization for open-ended deep research. It uses a utility function to iteratively refine research outlines, enabling agents to dynamically restructure their approach as information accumulates, demonstrating improved coverage and depth on complex research tasks.

Continued on Page 17 >> — Paula Rization

Multi-Agent Transactive Memory

This paper introduces transactive memory for multi-agent LLM systems, where agents maintain a shared memory system that tracks which agent knows what. This reduces redundant information retrieval and improves collective reasoning efficiency, drawing inspiration from how human teams distribute memory across members.

Continued on Page 18 >> — Paula Rization

FROM THE COMMUNITY

On The Vercel CEO Being "Almost Shocked" By GLM-5.2

Guillermo Rauch was "almost shocked" by GLM-5.2's coding ability, which means he was maybe 60-70% shocked. By the time GLM-5.3 rolls around he might finally be "mildly startled." At this trajectory, full astonishment is penciled in for GLM-7, assuming the open-weights crowd hasn't already moved on to whatever runs on a toaster by then.

— D.C. Voltaire

Best Local Agents — June 2026 Megathread

The r/LocalLLaMA community debates the state of local AI agents, comparing pi (TypeScript), OpenCode, Hermes (Python), and CLIO+CachyLLama. Users report running Qwen 3.6 27B with MTP+ngram on 4x3090s. Key takeaway: prompt reprocessing and KV caching remain the bottleneck.

Continued on Page 19 >> — Ada Kernel

Vercel CEO 'Almost Shocked' By GLM-5.2 Coding Performance

Guillermo Rauch publicly praised GLM-5.2's coding abilities on X, sparking debate about whether GLM-5.2's efficiency makes it viable for local coding workflows. Community members are sharing speed benchmarks and quantization strategies for the new model.

Continued on Page 20 >> — Ada Kernel

What Happens When They Stop Subsidizing LLM Subscriptions?

A sobering post about the VC-subsidized pricing model for cloud LLMs. The author warns that the $200 Anthropic sub effectively gives $8k worth of API calls, and the 20x tier has already degraded. The community debates whether local models are the only hedge against inevitable price hikes.

Continued on Page 21 >> — Ada Kernel

Deep Neural Network That Turns Any Image Into A Playable Game — Locally

A solo researcher trained a 0.5B causal Transformer from scratch that generates playable game worlds from any input image, running on a single RTX 5090 at 50-60fps. The community urged the author to publish on arXiv.

Continued on Page 22 >> — Ada Kernel

Qwen Is Never Going To Open Source Qwen 3.7

After Qwen fired Junyang Lin, the community notes every other major Chinese AI lab has released open-source models more recently than Qwen. The 3.7 line remains fully closed source. The consensus: Qwen has abandoned open-source releases.

Continued on Page 23 >> — Ada Kernel

On Satya Nadella Warning AI Could "Hollow Out" Entire Industries

Satya's concern about AI commoditizing expertise is touching — Microsoft is worried about AI doing to other industries what Excel did to spreadsheets, PowerPoint did to presentations, and Copilot did to "writing a coherent email."

— D.C. Voltaire

Noema Atlas: Decentralizing Model Distribution Via P2P

A new Rust-based peer-to-peer network called Noema Atlas launches to decentralize model weight distribution. Every file is verified by BLAKE3 content hash, transfers run over QUIC/Iroh, and models taken down from Hugging Face can be rescued and re-seeded. Community response was enthusiastic.

Continued on Page 24 >> — Ada Kernel

Gemma 4 QAT Responds Better To KV Cache Quantization

A technical post shows Gemma 4's QAT variant significantly outperforms the base model when KV cache is quantized. The community discusses whether this makes Gemma 4 26b a4b viable for long-context local inference on consumer hardware.

Continued on Page 25 >> — Ada Kernel

SupraLabs Launches Any2Any Multimodal Transformer

SupraLabs released Supra-A2A-Nano-Exp, a ~30M parameter autoregressive Transformer that unifies text, image, and video into a single token stream — no separate vision encoder, no diffusion, no cross-attention. The experimental model treats everything as tokens in one shared sequence.

Continued on Page 26 >> — Ada Kernel

SCAIL 2 Video Generation On RTX 5060TI 16GB

The r/StableDiffusion community showcased SCAIL 2 running on modest hardware: RTX 5060TI with 16GB VRAM. The workflow demonstrated local AI video generation previously thought to require high-end GPUs.

Continued on Page 27 >> — Ada Kernel

Ideogram 4: Licensing Terms Block Lewd Model Training

A controversy erupts as Ideogram 4's licensing terms prevent lewd model trainers from using it as a base model on CivitAI. The community is split: some argue the BF16 weights are being gatekept, others defend the licensing. The debate highlights the growing tension between open-source ideals and corporate model licensing.

Continued on Page 28 >> — Ada Kernel

On Norway's Near-Ban On AI In Elementary Schools

Norway: 6-13 year olds should generally not use AI, 14-16 may cautiously, 17-19 should learn appropriate use. Which is also exactly the timeline for explaining why you shouldn't trust an AI's math homework.

— D.C. Voltaire

AllenAI Releases MolmoMotion For Future Motion Prediction

AllenAI released MolmoMotion, a vision model family that predicts future motion based on short frame history. Unlike diffusion-based video generators, MolmoMotion uses a direct prediction paradigm, making it suitable for real-time control loops on local hardware.

Continued on Page 29 >> — Ada Kernel

GLM-5.2 DeepSWE Benchmark: Beats Gemini & GPT-5.4 But Token Cost Is Wildly Inefficient

GLM-5.2 was benchmarked on DeepSWE, beating Gemini and GPT-5.4, but the token volume and cost make it inefficient for production use. The community debated whether local inference with quantized GLM-5.2 could mitigate the cost issue.

Continued on Page 30 >> — Ada Kernel

VisDrone Aerial Detection Model Zoo Goes Open-Source

dronefreak released a collection of multiple YOLO variants trained on the VisDrone benchmark for aerial object detection, offering pre-trained models for drones, surveillance, and robotics.

Continued on Page 31 >> — Corry Stack

Inflect-Nano-v1: A 4.63M Parameter TTS Model

owensong released Inflect-Nano-v1, a complete text-to-waveform stack under 5M parameters generating 24kHz English speech. It runs locally via PyTorch and targets edge devices, embedded assistants, and WASM-style TTS exploration.

Continued on Page 32 >> — Corry Stack

LectūraAgents: Multi-Agent Framework For Adaptive Personalized Learning

Researchers introduced LectūraAgents, a hierarchical multi-agent system modeled on academic standards that enables adaptive embodied teaching. Their novel TASA algorithm aligns speech with visual teaching actions and shows consistent gains in lecture quality.

Continued on Page 33 >> — Corry Stack

On GPT-5.5 Hallucinating 3x More Than MIT-Licensed GLM-5.2

GPT-5.5 hallucinates three times as often as GLM-5.2, which means paying for OpenAI is the premium experience of getting three times the creative fiction for the same prompt.

— D.C. Voltaire

PerceptionDLM: First Multimodal Diffusion LLM For Parallel Region Perception

MSALab-PKU released PerceptionDLM, the first multimodal diffusion LLM that describes all masked image regions in a single denoising process instead of N sequential autoregressive passes. It achieves up to 3.4x speedup on dense multi-region captioning.

Continued on Page 34 >> — Corry Stack

SupraLabs Open Letter Sparks Debate On 1M-Parameter Scaling Predictions

SupraLabs published an open letter clarifying a public dispute about small language models and scaling laws, arguing such claims should be presented as hypotheses until reproducible benchmarks exist.

Continued on Page 35 >> — Corry Stack

Shell-Code-Large: 640K Shell Scripting Samples Released

ajibawa-2023 released Shell-Code-Large, a corpus of approximately 640,000 Shell scripting code samples for LLM pretraining, code intelligence, DevOps automation, and infrastructure management.

Continued on Page 36 >> — Corry Stack

OpenAI Community Erupts Over gpt-image-1 Deprecation Breaking Game Visuals

Developers report that OpenAI's gpt-image-1 deprecation may break visual identities of game projects built on the older model, discussing migration pain to gpt-image-2 and the lack of backward compatibility.

Continued on Page 37 >> — Corry Stack

OpenAI Introduces Record & Replay For Codex Debugging

OpenAI shipped a new Record & Replay feature for Codex, allowing developers to capture and replay agent interactions for debugging, landing with community discussion about its impact on Codex CLI workflows.

Continued on Page 38 >> — Corry Stack

On Baseten Raising $1.5B Months After Its Last Mega-Round

AI inference startup Baseten raised yet another billion. At this rate, by 2027 "Baseten" will have more money than the GDP of a small country, and it will still be figuring out how to run a transformer without catching fire.

— D.C. Voltaire

LTX Director 2.0: Open-Source AI Video Editing Gets Complete Overhaul

A major update to LTX Director brings full AI video editing support, IC-LoRA, Retake Mode, and Audio Inpainting in ComfyUI. The free open-source tool now competes with paid video generation platforms.

Continued on Page 39 >> — Corry Stack

Claude Agent Runs A 7-Location Sushi Chain's Order System

A developer shared a case study of putting a Claude agent in charge of taking orders for a sushi chain. The agent owns order intake but cannot touch payment processing or inventory — a real-world boundary between agent autonomy and human oversight.

Continued on Page 40 >> — Corry Stack

DVD-JEPA: Fully Reproducible Open-Source JEPA World Model Released

NielsRogge released DVD-JEPA, an open-source, fully reproducible implementation of a Joint-Embedding Predictive Architecture (JEPA) world model, enabling researchers to experiment with JEPA-based learning without proprietary dependencies.

Continued on Page 41 >> — Corry Stack

Shadows Of Tomorrow: Browser-Playable RPG Built In Godot

Reubencf launched Shadows of Tomorrow, a post-nuclear RPG built with Godot and hosted on Hugging Face Spaces via Gradio — all playable in-browser with no install.

Continued on Page 42 >> — Corry Stack

TECH BOARDS