AI — Page 6

Models, agents, infra, applied AI.

SubQ Releases Its 1.1 Small Model Card as Dangel and Whedon Try to Prove Long Context Can Beat RAG
The Subquadratic team says its sparse-attention model hits 98% needle retrieval at 12M tokens, but access remains limited to design partners.
Genesis AI's Eno robot rejects the humanoid default
Xian Zhou and Theophile Gervet are taking a wheeled, foldable path into physical AI after raising a $105 million seed round.
Analysis: Why Salesforce is buying Fin for about $3.6B
Salesforce + Fin: packaging customer-service agents for Agentforce
Moonshot AI's Yang Zhilin Pushes Kimi Deeper Into Coding Agents
Kimi K2.7-Code is a 1T-parameter MoE model with 32B active parameters, a 256K context window and open weights on Hugging Face.
Anthropic faces class-action claim over Claude Max 20x limits
Karl Kahn says Claude Max 20x delivered six to eight times Pro usage, not the 20x Anthropic advertised.
Microsoft's GitHub capacity crunch sends it to AWS
AI coding agents have turned GitHub reliability into an infrastructure problem Azure cannot absorb alone on Microsoft's timetable.
Hermes Agent's new async subagents take aim at the blocking-agent problem
Teknium says the delegate tool can now fan out work without freezing the chat, a practical change for long-running agent workflows.
Head to head: AnimateDiff Turbo vs Seedance 2 Image to Video
One model mostly gestures at the prompts; the other actually stages them. This matchup isn’t close: Seedance 2 Image to Video wins by turning specific shot language into coherent motion instead of settling for attractive approximation.
Cartesia packages Sonic-3.5 and Ink-2 into a full voice-agent stack
Karan Goel is using benchmark wins to pitch Cartesia as both the speaking and listening layer for real-time AI agents.
CrankGPT is a hand-cranked AI project with real edge-computing numbers
CrankGPT runs speech recognition, a small language model, and text-to-speech locally on a Raspberry Pi 5 with no battery or cloud.
Head to head: Anthropic: Claude Opus 4.8 vs Kimi K2.7 Code
Claude Opus 4.8 sweeps three of four tasks with sharper regex engineering, more polished prose, and cleaner structured output—Kimi K2.7 Code only manages a tie on the JSON normalization task.
Head to head: AuraFlow vs Rundiffusion Photo Flux
One model wins on the jobs that punish sloppiness: typography, layout discipline, and prompt-specific product detail. The other lands a moodier single-image hit, but not enough to overcome repeated misses where accuracy actually matters.
Claude Code user says the coding assistant saved his life by pushing him to the ER for AFib
A 73-year-old developer said he mentioned feeling unwell during a coding task, and Claude Code kept urging immediate care before doctors treated a sudden AFib episode.
SGLang adds DFlash to push Qwen 3.5 397B-A17B inference up to 4.3x faster
Z Lab, Modal and LMSYS released a DFlash drafter for Qwen's 397B model and benchmarked it above native MTP on 8x B200 GPUs.
Anthropic's Fable shutdown turns into a trust fight with Washington
The company pulled Fable 5 and Mythos 5 after a June 12 export-control order, then sent technical staff to Washington to repair the relationship.
NewCore emerges with $66M to make AI agents manageable identities
Zohar Alon's new identity-security startup is betting enterprises will need to govern agents like workers, not service accounts.
Head to head: AnimateDiff Turbo vs Marey Realism V1.5
One model delivers attractive motion clips; the other actually follows the brief shot by shot. In both tests, Marey Realism V1.5 separates itself by turning prompt details into believable action instead of decorative near-misses.
Head to head: AuraFlow vs Luma Uni-1 Edit
This matchup wasn’t close once the prompts demanded precise scene construction rather than just attractive images. AuraFlow can look polished, but Luma Uni-1 Edit was the model that actually followed the brief across all three tests.
Rio 3.5 page says wrong weights were uploaded after Nex-AGI analysis
The updated model card says a base merge of Nex-N2-Pro and Qwen was uploaded by mistake, shifting the dispute from pure attribution to release discipline.
Zuckerberg's $14 billion AI reset now needs customers
Alexandr Wang's Muse Spark gives Meta a proprietary model; the harder job is proving it can become more than ad infrastructure.
Pearl's AI mining pitch faces a 112 MW usefulness test
A June preprint claims Pearl's GPU network is doing random matrix math, not verified AI work, challenging Omri Weinstein's core bet.
PixelRAG makes the case that web RAG should read pixels, not parsed text
Yichuan Wang and collaborators show a screenshot-first retrieval system beating text pipelines, with lower agent token use and a real chunking gap.
Kimi K2.7 ranks second behind Fable 5 and above GPT 5.5 xhigh in ErdosBench's mathematical research test
Przemek Chojecki's 14-problem smoke run puts Moonshot's new open-weight model behind Claude Fable-5-max and ahead of GPT-5.5 xhigh.
Depthfirst turns FFmpeg into a proof point for autonomous security agents
The AI security startup says its agent found 21 FFmpeg zero-days for about $1,000, including an RCE exploit primitive.