Skip to content

feat: add aime26 benchmark (symbolic-only, MathArena source)#1123

Merged
gwarmstrong merged 2 commits intoNVIDIA-NeMo:mainfrom
gwarmstrong:georgea/migrate-gym-aime26
Apr 25, 2026
Merged

feat: add aime26 benchmark (symbolic-only, MathArena source)#1123
gwarmstrong merged 2 commits intoNVIDIA-NeMo:mainfrom
gwarmstrong:georgea/migrate-gym-aime26

Conversation

@gwarmstrong
Copy link
Copy Markdown
Contributor

Migrates aime26 from NeMo Skills (nemo_skills/dataset/aime26/) into Gym.

Migrates aime26 from NeMo Skills (`nemo_skills/dataset/aime26/`) into Gym.
Reuses the existing `math_with_judge` resource server with its default
`should_use_judge: false` to match Skills' `eval_type=math` symbolic-only
verification. Structurally identical to aime25/aime24.

- 30 problems from MathArena/aime_2026 (HuggingFace)
- Deterministic math_verify grading via math_with_judge
- Same prompt as aime25 (boxed-answer system prompt)

Signed-off-by: gwarmstrong <gwarmstrong@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@gwarmstrong gwarmstrong merged commit a1daea0 into NVIDIA-NeMo:main Apr 25, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants