AI video directors, persistent agent memory, color E-ink hardware, and a TUI Slack over SSH.
Multi-Token Prediction finally lands in llama.cpp, Mythos appears on Vertex AI, and Mistral's founder tells French Parliament that engineers no longer write code.
The long-awaited MTP (Multi-Token Prediction) support has been merged into llama.cpp master. This enables speculative decoding using draft MTP heads, significantly boosting generation speed for supported models. Early benchmarks show Qwen3.6-27B jumping from 7.6 to 16.2 t/s generation (+112%) on Strix Halo. The 35B variant shows mixed results depending on configuration. MTP quants are now appearing from Unsloth.
Flux goes real-time on webcam, DramaBox TTS gets LoRA support, and a new ComfyUI node generates synchronized foley sound from video.
A proper Android-to-Mac file manager, self-hosted podcast ad removal, and S3-native file sharing that never touches your server.