Developer tools and AI-native Mac apps lead today. A Claude Code monitor in your notch, an open-source security harness from Vercel, and a local-first AI memory tool that stores everything in plain markdown.
Claude Mythos sets a new bar on METR's time-horizon benchmark, a new open-source agent dethrones OpenClaw, and Airbnb joins the "most code is AI" club.
METR's updated time-horizon benchmark shows Claude Mythos Preview (early access) achieving a 50% success rate on tasks that take human experts 17 hours. The 80% success curve rates Mythos at around 3 hours. For context, Opus 4.6 was at ~2 hours and GPT-5.5 hasn't been measured yet.
The "time horizon" measures task difficulty by how long it takes a human expert, not how long the AI spends. Researchers note "we're on an exponential" in agent capability growth.
Quick hits from the AI world.
A viral AI-animated short shows what's possible by combining multiple gen tools, and open-source image models keep getting easier to fine-tune.
A fully AI-generated animated short by @Markoslavnic hit 3,600+ upvotes on r/singularity. Made using Runway for animation, Seedance 2 and Nano Banana for image generation, and GPT Image 2 for character design. The quality level sparked debate about whether traditional animation studios face existential pressure.
Privacy-first AI meeting notes, a slick self-hosted trip planner, and peer-to-peer file transfers without the cloud.