AI — Page 7

Models, agents, infra, applied AI.

Head to head: AnimateDiff Turbo vs Luma Ray 3.2 Image to Video
One model actually stages the prompt; the other mostly vibes around it. Across both tests, Luma Ray 3.2 Image to Video is the only system that consistently delivers the requested setting, motion, and subject choreography.
Databricks Open Sources Omnigent to Put a "Meta-Harness" Above AI Agents
Matei Zaharia and Kasey Uhlenhuth's alpha project sits above Claude Code, Codex, the Pi agent and custom agents to compose multi-agent workflows, share live sessions and enforce contextual policies.
Rio de Janeiro ships an open AI model built on Qwen
IplanRIO put Rio 3.5 Open 397B on Hugging Face with MIT licensing, a 1M-token context claim, and self-reported gains over Qwen's base model.
Zuckerberg's AI reorg is messy, unpopular, and probably the job map Big Tech needs
Meta's 6,500-person Applied AI unit looks less like automation magic than the unglamorous human operating system required to make models useful.
OpenRouter: Fusion beats DeepSeek-V4-Pro on substance
Fusion takes the match 34.6 to 32.3 by winning the harder precision tests, while DeepSeek-V4-Pro looks better on presentation and instruction-following in narrower spots. The split is clear: Fusion is the safer model when correctness matters; DeepSeek-V4-Pro is the cleaner stylist when the task is mostly packaging.
Malware authors use nuclear and biological weapons language to evade scanners
A Hades supply-chain wave hid weapons-policy bait in non-executing code comments to jam LLM-first malware triage.
Anthropic says the jailbreak behind Fable 5 shutdown was code review
The Amodeis' safety-first AI company is now fighting Washington over whether a narrow coding prompt justifies pulling frontier models.
Z.ai opens GLM-5.2 to every coding-plan tier
The new flagship adds High and Max reasoning modes and a 1M-context configuration for coding agents such as Claude Code and OpenClaw.
Head to Head: Claude Fable 5 vs ChatGPT 5.5
The open-source coding agent says Claude Fable 5 planned better, while GPT-5.5 matched it on execution at lower cost.
David Sacks Steps Into Anthropic's Fable 5 Export-Control Fight
Anthropic pulled Fable 5 and Mythos 5 after a June 12 U.S. directive, turning its safety-first model rollout into a policy test case.
Anthropic shuts off Fable 5 and Mythos 5 after US export-control order
The directive reaches foreign nationals inside the US, including Anthropic employees, turning frontier-model access into an export-compliance problem.
Happy Horse routs AnimateDiff Turbo on prompt fidelity
AnimateDiff Turbo looks slick, but in this matchup it barely showed up for the assignment. Happy Horse won both tests by actually staging the scenes, hitting the objects, and sustaining motion across frames.
Zyphra Releases ZONOS2, an Open-Weight Real-Time Voice-Cloning Model
Zyphra is pairing open weights, Apache 2.0 licensing, hosted inference, and its own TTS eval in a direct challenge to closed voice platforms.
Vercel AI SDK 7 adds HarnessAgent for coding-agent harnesses
Vercel's changelog says AI SDK 7 adds `HarnessAgent`, an experimental canary API for running Claude Code, Codex and Pi through sandboxed sessions and SDK-compatible streams.
General-purpose LLMs beat specialized AI tools in Nature Medicine study
The paper tested OpenEvidence and UpToDate Expert AI against GPT-5.2, Gemini 3.1 Pro and Claude Opus 4.6 across three medical evaluations.
Juggernaut Flux Lightning edges AuraFlow on image IQ
AuraFlow steals the poster brief, but Juggernaut Flux Lightning wins the match by being more convincing on the two harder tests: photoreal product realism and moody illustrative storytelling. The margin is slim, yet the verdict is not.
Kimi.ai releases Kimi-K2.7-Code as an open coding model
Kimi.ai says Kimi-K2.7-Code beats K2.6 on coding benchmarks and is meant for Kimi Code and the Kimi API, while a new beta program will give applicants early access to upcoming models and features.
grok-4.3 edges gpt-5.4 in a narrow, format-first fight
grok-4.3 takes the head-to-head by a hair, but only because it was more disciplined on the tasks that punished sloppiness. gpt-5.4 won the hardest parsing task, yet it gave back too much on instruction-following and formatting.
A DN42 scan by an AI agent ran up a $6,531 AWS bill
The May incident shows how weak spending controls can turn a delegated infrastructure task into real cloud liability.
Rajit Khanna turns PrismVideos' Hermes rebuild into an agent API
After replacing a Vercel-based media agent with Hermes, PrismVideos is pitching hosted agent infrastructure for teams that would rather ship tools than memory.
Wan v2.6 Crushes AnimateDiff on Prompted Video
AnimateDiff stays coherent, but coherence alone doesn’t win a head-to-head when the model keeps dodging the brief. Wan v2.6 Image to Video is the clear victor because it actually delivers the scenes it was asked to make.
Juggernaut Flux Base LoRA beats AuraFlow on utility
AuraFlow has the more distinctive artistic swing, but Juggernaut Flux Base LoRA wins the matchup by being more dependable where prompt fidelity actually matters. It takes two of three tasks, including the ones that punish sloppy layout, object placement, and text handling.
Jeff Bezos's Prometheus is a $41 billion bet on AI for physical engineering
Prometheus says it has raised $12 billion to build an AI system for designing and manufacturing complex products such as jet engines.
grok-4.3 edges gpt-5.4-mini on execution
grok-4.3 wins this matchup 38.3 to 36.2 by being a little more disciplined where it counts. gpt-5.4-mini is competitive and even sharper on one summarization task, but it gives away points on instruction fidelity and tone.