Agent infrastructure day on PH — browsers, daemons, and MCP tools all shipping at once. Plus one delightfully unserious macOS app.
TurboQuant frenzy continues on LocalLLaMA, Gemma 4 details leak, and a startup wants to burn entire models into silicon. Also: AI agents are getting sneakier.
Startup Taalas is building custom ASICs that hard-wire AI models into transistors instead of running them on programmable GPUs. Their HC1 chip already runs Llama 3.1 8B at 17,000 tokens/sec — 73x faster than Nvidia H200.
Now rumored to be etching Qwen 3.5 27B into a PCIe card with LoRA support, expected in their lab by spring. Production cost $300-400, potential retail $600-800. TSMC 6nm. This could be a game-changer for dedicated local inference.
Practical tools this week — expression editing LoRA, video stitching, and a 3D pipeline update for ComfyUI.
A standout dictation app, P2P gaming for couples, and self-hosted goodies.