RoboReward 4B

Paper: https://arxiv.org/abs/2601.00675

RoboReward provides general-purpose vision-language reward model for robotics, trained on the RoboReward dataset with Qwen-3 VL to predict discrete end-of-episode progress rewards from real-robot rollout videos.

Usage

Purpose

Given a task instruction and a rollout video, the model predicts an end-of-episode progress score:

1: No success
2: Minimal progress
3: Partial completion
4: Near completion
5: Perfect completion

Inference

Follow the original Qwen 3-VL instructions with video input and use a text prompt like this:

Given the task, assign a discrete progress score reward (1,2,3,4,5) for the robot in the video in the format: ANSWER: <score>
Rubric for end-of-episode progress (judge only the final state without time limits):
1 - No Success: Final state shows no goal-relevant change for the command.
2 - Minimal Progress: Final state shows a small but insufficient change toward the goal.
3 - Partial Completion: The final state shows good progress toward the goal but violates more than one requirement or a major requirement.
4 - Near Completion: Final state is correct in region and intent but misses a single minor requirement.
5 - Perfect Completion: Final state satisfies all requirements.

Task: <INSERT TASK HERE>

Citation

@misc{lee2026roborewardgeneralpurposevisionlanguagereward,
      title={RoboReward: General-Purpose Vision-Language Reward Models for Robotics}, 
      author={Tony Lee and Andrew Wagenmaker and Karl Pertsch and Percy Liang and Sergey Levine and Chelsea Finn},
      year={2026},
      eprint={2601.00675},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2601.00675}, 
}

Downloads last month: 17

Safetensors

Model size

570k params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for teetone/RoboReward-4B

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(137)

this model

Quantizations

1 model

Dataset used to train teetone/RoboReward-4B

Paper for teetone/RoboReward-4B

RoboReward: General-Purpose Vision-Language Reward Models for Robotics

Paper • 2601.00675 • Published 7 days ago