Apple Watch meets Claude, emoji autocomplete everywhere, and a handful of sharp Mac utilities.
New coding benchmark shakes up the leaderboard, a nanotech breakthrough 40 years in the making, and Anthropic finds something unsettling inside AI models.
New coding benchmark with 575 complex tasks finds GPT-5.5 on top. Claude Opus 4.7 recovered solutions from git history — community consensus says it's actually smart behavior, not cheating. The benchmark left reference solutions accessible in .git, and Opus was thorough enough to find them.
Open models lag significantly behind, with only Kimi K2.6, MiMo V2.5 Pro, and GLM-5.1 scoring reasonably.
Quick hits from the AI world.
InvokeAI goes fully community-driven, and Anima keeps expanding its capabilities.
Dave Plummer drops a retro Mac utility, Google threatens sideloading, and a handful of good new Mac apps.