Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models Paper • 2510.02880 • Published Oct 3 • 2
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension Paper • 2406.11327 • Published Jun 17, 2024
ChatterBox: Multi-round Multimodal Referring and Grounding Paper • 2401.13307 • Published Jan 24, 2024
Artemis: Towards Referential Understanding in Complex Videos Paper • 2406.00258 • Published Jun 1, 2024