Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation Paper • 2605.26111 • Published 10 days ago • 11
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers Paper • 2605.23892 • Published 13 days ago • 8
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models Paper • 2603.17117 • Published Mar 17 • 87