Commit
·
a17c3c8
1
Parent(s):
e3f1cee
Update README with eval details
Browse files
results/OrbyAgent-ActIO-72b/README.md
CHANGED
|
@@ -2,4 +2,6 @@
|
|
| 2 |
|
| 3 |
This agent is developed by [Orby AI](https://www.orby.ai/).
|
| 4 |
|
|
|
|
|
|
|
| 5 |
It uses the ActIO model of 72B parameters as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
|
|
|
|
| 2 |
|
| 3 |
This agent is developed by [Orby AI](https://www.orby.ai/).
|
| 4 |
|
| 5 |
+
The agent does not use any benchmark-specific information in the prompts. For WebArena benchmark, we use the original evaluator and task definitions for fair comparison.
|
| 6 |
+
|
| 7 |
It uses the ActIO model of 72B parameters as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
|
results/OrbyAgent-Claude-3.5-Sonnet/README.md
CHANGED
|
@@ -2,4 +2,6 @@
|
|
| 2 |
|
| 3 |
This agent is developed by [Orby AI](https://www.orby.ai/).
|
| 4 |
|
|
|
|
|
|
|
| 5 |
It uses Claude-3.5-sonnet-20241022 as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
|
|
|
|
| 2 |
|
| 3 |
This agent is developed by [Orby AI](https://www.orby.ai/).
|
| 4 |
|
| 5 |
+
The agent does not use any benchmark-specific information in the prompts. For WebArena benchmark, we use the original evaluator and task definitions for fair comparison.
|
| 6 |
+
|
| 7 |
It uses Claude-3.5-sonnet-20241022 as a backend, with both screenshot and HTML as inputs. More details can be found in our [research blog]().
|