Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 3 days ago • 44
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 5 days ago • 271
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Paper • 2604.09531 • Published 4 days ago • 7
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 5 days ago • 221
OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence Paper • 2604.07296 • Published 6 days ago • 34
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 6 days ago • 306
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 6 days ago • 160
Action Images: End-to-End Policy Learning via Multiview Video Generation Paper • 2604.06168 • Published 7 days ago • 13
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings Paper • 2604.04323 • Published 8 days ago • 38
Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision Paper • 2604.04934 • Published 8 days ago • 42
Running on Zero Featured 10 StyleRenderer 🎨 10 Generate stylized video from game G‑buffer inputs
Token Warping Helps MLLMs Look from Nearby Viewpoints Paper • 2604.02870 • Published 11 days ago • 33
Token Warping Helps MLLMs Look from Nearby Viewpoints Paper • 2604.02870 • Published 11 days ago • 33
A Simple Baseline for Streaming Video Understanding Paper • 2604.02317 • Published 12 days ago • 72
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 12 days ago • 138
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding Paper • 2604.00528 • Published 13 days ago • 12