Spaces:

ServiceNow
/

browsergym-leaderboard

Running

ligang-orby commited on Feb 21

Commit

a17c3c8

1 Parent(s): e3f1cee

Update README with eval details

Files changed (2) hide show

results/OrbyAgent-ActIO-72b/README.md CHANGED Viewed

@@ -2,4 +2,6 @@
 This agent is developed by [Orby AI](https://www.orby.ai/).
 It uses the ActIO model of 72B parameters as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().

 This agent is developed by [Orby AI](https://www.orby.ai/).
+The agent does not use any benchmark-specific information in the prompts. For WebArena benchmark, we use the original evaluator and task definitions for fair comparison.
 It uses the ActIO model of 72B parameters as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().

results/OrbyAgent-Claude-3.5-Sonnet/README.md CHANGED Viewed

@@ -2,4 +2,6 @@
 This agent is developed by [Orby AI](https://www.orby.ai/).
 It uses Claude-3.5-sonnet-20241022 as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().

 This agent is developed by [Orby AI](https://www.orby.ai/).
+The agent does not use any benchmark-specific information in the prompts. For WebArena benchmark, we use the original evaluator and task definitions for fair comparison.
 It uses Claude-3.5-sonnet-20241022 as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().