AI — Page 14
Models, agents, infra, applied AI.
- Socket uncovers TrapDoor campaign stealing keys and wallets via open source packages
Socket researchers say the TrapDoor campaign planted credential-stealing payloads in more than 34 packages and 384+ versions, targeting crypto and AI developers.
- DeepMind preprint: LLM+Lean agent resolves 9 Erdos problems and 44 OEIS conjectures
Google DeepMind reports a full-featured AlphaProof Nexus agent solved 9 of 353 open Erdos problems at a few hundred dollars per problem and proved 44 of 492 OEIS conjectures; code and Lean proofs are on GitHub. This is an arXiv preprint and community validation is pending.
- Greg Brockman walks through the 72 hours that almost killed OpenAI
In a rare interview on The Knowledge Project, the OpenAI co-founder recounts quitting within hours, sketching a backup company, and why they stopped showing reasoning traces.
- Draw Things adds on-device Qwen 3.5 4B and a clearer mode selector on iOS and macOS
The app says it now runs a lightweight local LLM as an interrogator model and splits generation and editing into distinct modes for a cleaner workflow.
- Andrej Karpathy joins Anthropic after stints at OpenAI and Tesla
Aligned News flagged the move; Wikipedia lists Karpathy on Anthropic’s pretraining team in 2026, underscoring top-tier talent consolidation at frontier labs.
- BenchFlow to launch SkillsBench at ACM CAIS, presented by Google DeepMind
The San Francisco afterparty will spotlight 100+ expert curated agent tasks, live demos of the BenchFlow SDK, and Kaggle’s new Agent Benchmarks.
- Anthropic says Project Glasswing surfaced 10,000+ high-severity bugs in a month
Cloudflare alone found 2,000 issues; Anthropic’s OSS scans show a 90.6% triaged true-positive rate and project nearly 3,900 high-or-critical bugs ahead.
- Atomic.chat says Qwen 3.7-max beat Opus 4.7 and GPT-5.5 in agentic Tetris-bot test
In a 10-iteration code-and-rewrite challenge, the team let each model build and train a Tetris bot, then compared the final agents.
- Solo founder Ben Cera says Polsia raise it's own financing of $30M at a $250M valuation, autonomously
Approaching $10M ARR, Cera says Polsia runs companies autonomously with one founder and AI, and even handled its own raise while he "just showed up for signatures."
- Klemen Kotar announces PSI-0.5, a promptable physical world model
Kotar framed PSI-0.5 as contrasting with world models focused on moving around scenes; the reposted announcement we saw did not include links to code, a demo, or a paper.
- Speridlabs exits stealth to build a spatial AI foundation model
The research lab outlined a staged plan for a single 3D-native model and an open-by-default posture; Pear VC and Base10 are backing the effort.
- Qwen 3.7 Max draws head-to-head comparisons with GPT and Gemini-class systems
Benchmark chatter has shifted from open vs weak baselines to parity talk, with hints that Qwen 3.7 Max posts a strong math score.
- Whip launches as a social feed of tappable AI mini apps and games
Creator samagra14 debuts Whip as a social home for playful, weird, useful AI mini apps and games, with no-code creation and a public download link.
- Cosmo aims to give AI agents a desktop UI, launched by Shiyuan on X
In an X thread, Shiyuan says Cosmo lets you type or speak from the desktop while the interface renders live; some demo changes are not live yet.
- Runway pushes Aleph 2.0 into an editing workflow with new Edit Studio
Runway is steering its video model toward controllable, preview-first editing, not just one-shot generation, per a thread on X and a new Generative Session page.
- Microsoft starts canceling Claude Code licenses, pushes engineers to GitHub Copilot CLI
Internal memo sets a June 30 cutoff for most Claude Code seats; Microsoft frames the shift as convergence on a first-party CLI and a cost move tied to fiscal year-end, per The Verge.
- Inside 'Google Zero': AI answers move intent off the open web
Founders and indie publishers describe the same analytics signature as Google's summary-style results, including AI Overviews, satisfy queries on the results page and thin out the clicks that fund new work.
- ClickUp CEO Zeb Evans says agent-driven workflows can 100x orgs and 1000x top performers
In an X thread announcing a 22% cut, Evans framed a shift to agents and smaller, faster teams, citing a weeklong frontend architecture rebuild, 40x research, faster code review of agent outputs, and million-dollar bands for outsized impact.
- Anthropic-backed services firm acquires Fractional AI to anchor enterprise push
San Francisco-based Fractional AI, founded by Chris Taylor, Eddie Siegel, and Travis May, will be the operational core of a new Anthropic, Blackstone, and H&F backed firm.
- Minkai Xu introduces Gemini Omni, a self-described world model, in a Google DeepMind X post
Researcher-builder Minkai Xu says he has been heads-down for months on a world model, with the announcement appearing on Google DeepMind's X account.
- OpenAI will watermark ChatGPT images with Google's SynthID and launch a provenance checker
Images from ChatGPT, the OpenAI API, and OpenAI Codex will carry SynthID and C2PA signals, with a checker to verify provenance and new participation in C2PA conformance.
- Lab0 says its AI FDE cuts enterprise rollouts from 6 months to 10 days
Founders say the agent automates discovery-to-go-live, from drafting docs and test plans to configuration and integrations; Lab0 also says it is already working with Adobe.
- Chinese cities are rolling out AI robot barber kiosks, Cointelegraph says
Cointelegraph said in a thread on X that 3D-scanning robot barber kiosks are appearing in Chinese cities for 60 yuan a cut, but did not name the maker or list specific cities.
- Stanford cs329x publishes human-centered LLM playbook
The cs329x report, led by Caleb Ziems and Dora Zhao, maps design, data, tuning, evals, and deployment tradeoffs, and flags engagement-optimization and sycophancy incentives misaligned with goals like user empowerment and mastery.