7 39 44

Manan Shah

cs-mshah

https://cs-mshah.github.io/

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper 1 day ago

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

liked a model 2 days ago

rednote-hilab/dots.ocr

liked a Space 4 days ago

HuggingFaceTB/smol-training-playbook

View all activity

Organizations

upvoted a paper 1 day ago

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

Paper • 2601.11087 • Published 5 days ago • 8

upvoted a paper 4 days ago

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published 7 days ago • 177

upvoted 2 papers 8 days ago

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Paper • 2601.05175 • Published 13 days ago • 32

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published 10 days ago • 202

upvoted a paper 12 days ago

Choreographing a World of Dynamic Objects

Paper • 2601.04194 • Published 14 days ago • 12

upvoted a paper 14 days ago

VINCIE: Unlocking In-context Image Editing from Video

Paper • 2506.10941 • Published Jun 12, 2025 • 4

upvoted an article 14 days ago

Article

Generalist Robot Policy Evaluation in Simulation with NVIDIA Isaac Lab-Arena and LeRobot

15 days ago

•

upvoted an article 15 days ago

Article

NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI

15 days ago

•

upvoted a paper 16 days ago

Evaluating Parameter Efficient Methods for RLVR

Paper • 2512.23165 • Published 23 days ago • 25

upvoted 2 papers 21 days ago

ProEdit: Inversion-based Editing From Prompts Done Right

Paper • 2512.22118 • Published 26 days ago • 17

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published 23 days ago • 64

upvoted a paper 25 days ago

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Paper • 2512.20557 • Published 29 days ago • 49

upvoted an article 27 days ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Jun 3, 2025

•

311

upvoted 2 articles about 2 months ago

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

579

Article

Continuous batching from first principles

Nov 25, 2025

•

306

upvoted 2 collections 2 months ago

MetaCLIP2 Multilingual

Collection

8 items • Updated Nov 12, 2025 • 16

📄 FinePDFs

Collection

82 items • Updated 11 days ago • 27

upvoted a paper 3 months ago

Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 120

upvoted an article 4 months ago

Article

Metric and Relative Monocular Depth Estimation: An Overview. Fine-Tuning Depth Anything V2 👐 📚

Jul 10, 2024

•

upvoted a paper 4 months ago

Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 51

Manan Shah

AI & ML interests

Recent Activity

Organizations

cs-mshah's activity

Generalist Robot Policy Evaluation in Simulation with NVIDIA Isaac Lab-Arena and LeRobot

NVIDIA Cosmos Reason 2 Brings Advanced Reasoning To Physical AI

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

We Got Claude to Fine-Tune an Open Source LLM

Continuous batching from first principles

Metric and Relative Monocular Depth Estimation: An Overview. Fine-Tuning Depth Anything V2 👐 📚