Jaykumaran R's picture

Jaykumaran R

Jaykumaran17

·

Jaykumaran

AI & ML interests

None yet

Organizations

None yet

upvoted 2 papers 6 months ago

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Paper • 2506.15681 • Published Jun 18 • 39

upvoted a collection 6 months ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 550

upvoted 2 articles 7 months ago

Article

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

Jun 11

•

121

Article

Introducing Training Cluster as a Service - a new collaboration with NVIDIA

+1

Jun 11

•

26

upvoted a collection 7 months ago

Vision Language Models Papers 🖼️💬📝

Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated Apr 30, 2024 • 40

upvoted a paper 7 months ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 147

upvoted a collection 7 months ago

SmolVLA

Small, efficient and light-weight VLAs pretrained on community datasets • 1 item • Updated Sep 5 • 31

upvoted a paper 7 months ago

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33

upvoted 2 articles 7 months ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

+7

Jun 3

•

299

Article

Vision Language Models (Better, faster, stronger)

+3

May 12

•

572

upvoted a collection 7 months ago

NVILA

11 items • Updated Sep 13 • 16

upvoted 2 articles 8 months ago

Article

SmolLM - blazingly fast and remarkably powerful

+1

Jul 16, 2024

•

436

Article

Visually Multilingual: Introducing mcdse-2b

Oct 27, 2024

•

41

upvoted a collection 8 months ago

Multimodal DSE Retrievers

A collection of DSE models for multimodal retrieval • 5 items • Updated Apr 15 • 15

upvoted 3 articles 9 months ago

Article

NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets

+3

Mar 18

•

42

Article

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

+2

Feb 4

•

186

Article

SmolVLM - small yet mighty Vision Language Model

+3

Nov 26, 2024

•

396

upvoted a paper 9 months ago

VGGT: Visual Geometry Grounded Transformer

Paper • 2503.11651 • Published Mar 14 • 35

upvoted an article 9 months ago

Article

Open-Source Handwritten Signature Detection Model

Mar 14

•

120