Request for official LexBench-Browser result review/submission #5950

Julia-Lex · 2026-05-12T04:25:52Z

Julia-Lex
May 12, 2026

Hi Skyvern maintainers,

We maintain browseruse-agent-bench, a reproducible evaluation framework for browser agents. It includes LexBench-Browser, a public dataset with 210 no-login tasks across 107 real websites.

We already support running skyvern in the framework and would like to invite maintainers or power users to review the integration or submit an official result.

The goal is not to claim a ranking from our side, but to make the result reproducible and fair:

confirm the recommended Skyvern version or commit
record model provider and model ID
record browser backend and launch mode
include proxy or region notes if used
include task-level artifacts
disclose known skips, retries, provider incidents, or browser constraints
allow maintainer review and rerun before listing as official

Useful links:

Framework repo: https://github.com/lexmount/browseruse-agent-bench
Dataset: https://huggingface.co/datasets/Lexmount/LexBench-Browser
Evaluation protocol: https://github.com/lexmount/browseruse-agent-bench/blob/main/EVALUATION_PROTOCOL.md
Result submission docs: https://github.com/lexmount/browseruse-agent-bench/tree/main/community/results

Would you be open to reviewing the integration or submitting an official Skyvern result? If there is a preferred configuration for benchmarking Skyvern, we would be happy to reflect that in the result metadata.

Thanks!

LexBench Team
lexbench@lexmount.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for official LexBench-Browser result review/submission #5950

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Request for official LexBench-Browser result review/submission #5950

Uh oh!

Julia-Lex May 12, 2026

Replies: 0 comments

Julia-Lex
May 12, 2026