You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We maintain browseruse-agent-bench, a reproducible evaluation framework for browser agents. It includes LexBench-Browser, a public dataset with 210 no-login tasks across 107 real websites.
We already support running skyvern in the framework and would like to invite maintainers or power users to review the integration or submit an official result.
The goal is not to claim a ranking from our side, but to make the result reproducible and fair:
confirm the recommended Skyvern version or commit
record model provider and model ID
record browser backend and launch mode
include proxy or region notes if used
include task-level artifacts
disclose known skips, retries, provider incidents, or browser constraints
allow maintainer review and rerun before listing as official
Would you be open to reviewing the integration or submitting an official Skyvern result? If there is a preferred configuration for benchmarking Skyvern, we would be happy to reflect that in the result metadata.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Skyvern maintainers,
We maintain browseruse-agent-bench, a reproducible evaluation framework for browser agents. It includes LexBench-Browser, a public dataset with 210 no-login tasks across 107 real websites.
We already support running
skyvernin the framework and would like to invite maintainers or power users to review the integration or submit an official result.The goal is not to claim a ranking from our side, but to make the result reproducible and fair:
Useful links:
Would you be open to reviewing the integration or submitting an official Skyvern result? If there is a preferred configuration for benchmarking Skyvern, we would be happy to reflect that in the result metadata.
Thanks!
LexBench Team
lexbench@lexmount.com
Beta Was this translation helpful? Give feedback.
All reactions