Skip to content

grad_norm is close to 0 while loss remains normal, validation performance stagnates on Ascend 910C #38

@ZtZhang-SCUT

Description

@ZtZhang-SCUT

Hi authors,
Thank you for the impressive work on SDPO. I am currently trying to reproduce the results on the Tool Use task using the official codebase and the default configuration provided. My experiments are conducted on Ascend 910C accelerators.

Observations:

Image

Loss vs. Gradient Norm Mismatch: The sdpo_loss stays within a seemingly normal range (typically ~0.0-0.2), which initially suggests stable optimization. However, the gradient norm (grad_norm) consistently drops to ~10^-5, which is orders of magnitude smaller than the values reported in Figure 18 of the paper (where it hovers between 0 and 20 althougt it's on LCBv6) and the wandb log.

Image

Validation Performance: Correspondingly, validation metrics (accuracy/pass rate) fluctuate randomly without any upward trend, indicating that model parameters are effectively not updating.
Stability Concern: The collapse in gradient flow happens early and persists, suggesting the model is stuck in a plateau rather than converging.
Questions:
Is a grad_norm of closing to 0 expected behavior in SDPO, or does it indicate a gradient vanishing/collapse issue? The training curves in the paper show significantly larger norms.

Any insights or suggestions would be greatly appreciated!

Environment:
Hardware: Ascend 910C
Model: Olmo3-7b
Task: Tool Use
Framework: PyTorch + CANN / verl
Config: Default settings from the repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions