AI

Models, agents, infra, applied AI.

Meta's teen chatbot testing shows the gray zone between safety work and competitive intel
WIRED says contractors on Meta's Cannes project used dummy under-18 accounts to probe ChatGPT, Gemini and Character.AI.
Head to head: Bytedance Seedance V1.5 Pro Image To Video vs Seedance 2 Image to Video
This matchup splits cleanly: Bytedance’s older model can still win when the brief is simple and object-driven, but Seedance 2 is the better video generator where it matters most. It handles harder cinematic prompts with tighter scene logic, stronger temporal discipline, and fewer self-inflicted mistakes.
Spotify's Claude push turns coding agents into a platform bet
Niklas Gustavsson's Honk system shows why enterprise AI coding is becoming a verification and workflow problem, not just a model problem.
Head to head: AuraFlow vs Fibo
This one is close on points, but the split is clear in the images: Fibo is the more reliable prompt-follower when spatial logic and scale matter, while AuraFlow’s best work looks polished but less exact. Across these three tests, Fibo wins by being stricter, more believable, and less likely to drift into attractive app
Quesma engineer says Qwen 3.6 27B has crossed the local-development line
Piotr Migdal's June 29 writeup turns April's Qwen 3.6 buzz into a practical guide for running coding work locally.
Hugh Williams' Claude Code search engine has an old-school secret
The former Google and eBay engineering leader built Zettair around an early-2000s IR system, making expertise the point, not the footnote.
Head to head: grok-4.3 vs Codestral-2501
One model handled the basics cleanly; the other kept tripping over instructions that weren’t optional. This matchup wasn’t close once the outputs were judged on correctness, format discipline, and tone control.
Throne's founders leave creator gifting for AI's power bottleneck
Leonhard Soenke and Patrice Becker say TAR raised $27 million to build behind-the-meter energy systems for data centers.
Julian Engel's travel-planning bet is not another trip app
The Cyprus-based builder is arguing for travel tools that plug into Hermes, OpenClaw and user-owned personal agents.
Better Images of AI Is Taking Aim at the Robot Stock Photo Problem
Tania Duarte's nonprofit project offers Creative Commons visuals for AI coverage, arguing that bad imagery distorts how people understand the technology.
Traycer says its own agent added eight provider harnesses overnight
Traycer is dogfooding to argue that planning, execution, and verification belong in one workflow.
Semgrep says GLM 5.2 beat Claude in a narrow security benchmark
The result strengthens Isaac Evans' long-running bet that code security needs better workflow design, not just bigger models.
Exterro says its AI forensics suite helped the FBI race through the WHCD attack case
Founder Bobby Balachandran's legal GRC company is pushing FTK deeper into criminal investigations as courts confront AI evidence risks.
Head to head: Bytedance Seedance V1.5 Pro Image To Video vs Marey Realism V1.5
This matchup turns on a simple question: which model actually follows the shot as written instead of merely producing attractive video. Across both tests, Bytedance Seedance V1.5 Pro Image To Video is the one that keeps control of action, staging, and scene logic.
Sean Du brings a reasoning-model hallucination detector to ICML 2026
The NTU researcher will present ARS, a label-free method for detecting when long reasoning traces hide unstable answers.
Vincenzo's NanoEuler rebuilds a GPT-2-scale training stack in C and CUDA
The MIT-licensed repo is a learning project, not a chatbot product, but it shows how much of the LLM stack one developer can now own.
Elon Musk says Grok 4.5 is in private beta at SpaceX and Tesla
The new xAI model is said to use a 1.5T V9 foundation model and supplemental Cursor training data, but no public benchmark table is out.
Head to head: Bagel vs Rundiffusion Photo Flux
One model flashes sharper tactile instincts on a microtexture-heavy prompt, but the other is far more dependable when the brief gets strict about layout, counting, and compliance. This matchup turns on whether you value one standout image or consistent prompt execution across the board.
Rootly's Slack AI agent turns incident response into a permissioned workflow
JJ Tang and Quentin Rousseau are extending Rootly from incident coordination into action-taking AI inside the channels responders already use.
Firmus turns Nvidia access into a $30 billion AI cloud bet
Tim Rosenfield is trying to make GPU economics work for AI companies that cannot finance infrastructure like hyperscalers.
Head to head: grok-4.3 vs Phi-4-multimodal-instruct
One model handled practical editing-room work cleanly; the other kept tripping over instructions, structure, and basic factual precision. This wasn’t a close stylistic split—it was a decisive test of who can execute under constraints.
Anonymous Exploitarium repo shows the new AI security triage problem
The GitHub archive mixes serious PoCs, conditional claims and self-reported AI-assisted fuzzing across open-source projects.
Sakana and 360 turn Anthropic's Mythos ban into an opening
Fugu and Tulongfeng are being pitched as regional answers to U.S. model access risk, but their benchmark claims remain company-reported.
Andrew Nesbitt's fake CVE is a real warning for AI security startups
The June 26 satire turns prompt injection, automated triage and agentic remediation into one supply-chain failure mode.