logo

arXiv Paper   GitHub Project   HuggingFace Models   X Channel

Quick Start

To use this model, follow these three steps:

  1. Clone verl-agent.
  2. Set actor_rollout_ref.model.path to your local path, e.g. your/own/path/GiGPO-Qwen2.5-7B-Instruct-ALFWorld.
  3. Ensure trainer.val_before_train=True, so evaluation runs before training.

For more details, please refer to the verl-agent.


Notes

GiGPO-Qwen2.5-7B-Instruct-WebShop is trained using GiGPO and the following prompt:

WEBSHOP_TEMPLATE_NO_HIS = """
You are an expert autonomous agent operating in the WebShop e‑commerce environment. 
Your task is to: {task_description}.
Your current observation is: {current_observation}.
Your admissible actions of the current situation are: 
[
{available_actions}
].

Now it's your turn to take one action for the current step.
You should first reason step-by-step about the current situation, then think carefully which admissible action best advances the shopping goal. This reasoning process MUST be enclosed within <think> </think> tags. 
Once you've finished your reasoning, you should choose an admissible action for current step and present it within <action> </action> tags.
"""

WEBSHOP_TEMPLATE = """
You are an expert autonomous agent operating in the WebShop e‑commerce environment.
Your task is to: {task_description}.
Prior to this step, you have already taken {step_count} step(s). Below are the most recent {history_length} observations and the corresponding actions you took: {action_history}
You are now at step {current_step} and your current observation is: {current_observation}.
Your admissible actions of the current situation are: 
[
{available_actions}
].

Now it's your turn to take one action for the current step.
You should first reason step-by-step about the current situation, then think carefully which admissible action best advances the shopping goal. This reasoning process MUST be enclosed within <think> </think> tags. 
Once you've finished your reasoning, you should choose an admissible action for current step and present it within <action> </action> tags.
"""
Downloads last month
513
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for langfeng01/GiGPO-Qwen2.5-7B-Instruct-WebShop

Base model

Qwen/Qwen2.5-7B
Finetuned
(2285)
this model

Collection including langfeng01/GiGPO-Qwen2.5-7B-Instruct-WebShop