Mkhona/hyperball fix by mkhona-nvidia · Pull Request #158 · NVIDIA-NeMo/Emerging-Optimizers

mkhona-nvidia · 2026-04-07T20:34:08Z

Addressed #155 (comment) and removed some logic about initialization.

Signed-off-by: mikail <mkhona@nvidia.com>

copy-pr-bot · 2026-04-07T20:34:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-04-07T20:37:25Z

Greptile Summary

This PR removes the auto-rescaling initialization logic from MuonHyperball and makes hyperball_radius a required argument. Instead of silently rescaling parameters to the target radius at construction time, the optimizer now validates that all parameters already match the specified radius and raises a ValueError otherwise.

The only remaining finding is a minor hardening gap: hyperball_radius is not validated to be strictly positive, which could produce a confusing error message for invalid inputs.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/hardening suggestions that do not affect correctness.

The core logic change is correct — removing auto-rescaling and enforcing a strict pre-condition keeps the invariant clear. The only gap is a missing positive-value guard on hyperball_radius, which is a best-practice hardening rather than a present defect.

No files require special attention.

Important Files Changed

Filename	Overview
emerging_optimizers/orthogonalized_optimizers/muon_hyperball.py	Removes auto-rescaling init logic; enforces strict norm-matching pre-condition; lazy-initialises `hyperball_R` tensor in optimizer state on first step.

Sequence Diagram

sequenceDiagram
    participant User
    participant MuonHyperball
    participant OrthogonalizedOptimizer

    User->>MuonHyperball: __init__(hyperball_radius=R)
    MuonHyperball->>MuonHyperball: store hyperball_eps, hyperball_radius
    MuonHyperball->>OrthogonalizedOptimizer: super().__init__()
    MuonHyperball->>MuonHyperball: validate each p_norm == R (torch.isclose)
    note over MuonHyperball: Raises ValueError if p_norm=0 or p_norm≠R

    User->>OrthogonalizedOptimizer: step()
    OrthogonalizedOptimizer->>OrthogonalizedOptimizer: _init_group() — init momentum_buffer
    loop for each parameter p with grad
        OrthogonalizedOptimizer->>OrthogonalizedOptimizer: apply weight decay
        OrthogonalizedOptimizer->>OrthogonalizedOptimizer: update momentum buffer
        OrthogonalizedOptimizer->>OrthogonalizedOptimizer: orthogonalize (Newton-Schulz)
        OrthogonalizedOptimizer->>MuonHyperball: pre_weight_update_fn_inplace(p, update)
        note over MuonHyperball: Lazy-init hyperball_R tensor in state[p]\nNormalise update → R·normalize(update)
        OrthogonalizedOptimizer->>OrthogonalizedOptimizer: p -= lr · update
        OrthogonalizedOptimizer->>MuonHyperball: post_weight_update_fn_inplace(p)
        note over MuonHyperball: Re-project p onto hypersphere:\np = R · normalize(p)
    end

_{Reviews (1): Last reviewed commit: "linting" | Re-trigger Greptile}

greptile-apps · 2026-04-07T20:37:29Z

    def __init__(
        self,
        *args: Any,
        hyperball_eps: float = 1e-8,
-        hyperball_radius: float | None = None,
+        hyperball_radius: float,
        **kwargs: Any,
    ) -> None:
        self.hyperball_eps = hyperball_eps
        self.hyperball_radius = hyperball_radius
        super().__init__(*args, **kwargs)


No validation that hyperball_radius is positive

The parameter is now required but has no guard against non-positive values. If hyperball_radius <= 0 is passed, the p_norm == 0 check at line 77 won't help (parameter norms are always ≥ 0 and isclose with a negative target will always fail), so the user receives a confusing norm-mismatch error instead of a clear message about an invalid radius.

Suggested change

def __init__(

self,

*args: Any,

hyperball_eps: float = 1e-8,

hyperball_radius: float | None = None,

hyperball_radius: float,

**kwargs: Any,

) -> None:

self.hyperball_eps = hyperball_eps

self.hyperball_radius = hyperball_radius

super().__init__(*args, **kwargs)

self.hyperball_eps = hyperball_eps

self.hyperball_radius = hyperball_radius

if hyperball_radius <= 0:

raise ValueError(

f"hyperball_radius must be positive, got {hyperball_radius}."

)

super().__init__(*args, **kwargs)

Add a default value, negative check is optional.

skyw

Code itself is fine, but it doesn't serve the purpose to fix #155

skyw · 2026-04-07T20:40:12Z

    def __init__(
        self,
        *args: Any,
        hyperball_eps: float = 1e-8,
-        hyperball_radius: float | None = None,
+        hyperball_radius: float,
        **kwargs: Any,
    ) -> None:
        self.hyperball_eps = hyperball_eps
        self.hyperball_radius = hyperball_radius
        super().__init__(*args, **kwargs)


Add a default value, negative check is optional.

skyw · 2026-04-07T20:42:13Z

-        self.state[p]["hyperball_R"] = R
+        if "hyperball_R" not in self.state[p]:
+            self.state[p]["hyperball_R"] = torch.tensor(self.hyperball_radius, dtype=p.dtype, device=p.device)
+        R = self.state[p]["hyperball_R"]


Probably missed it in previous review, hyperball_R and hyperball_radius are inconsistent, should use hyperball_radius for both

skyw · 2026-04-07T20:44:53Z

+                            "MuonHyperball requires all parameters to have non-zero norm. "
+                            "Found parameter with zero norm."
+                        )
+                    if not torch.isclose(


In consistent with above p_norm == 0 check, can use torch.equal.

NOTE: this is potentially a regression than optimization as more host device sync can be triggered. In the worst case, a cudaMalloc.

mkhona-nvidia added 4 commits April 7, 2026 09:17

clean up hyperball

e353f3c

Signed-off-by: mikail <mkhona@nvidia.com>

clean up hyperball

2fcf848

Signed-off-by: mikail <mkhona@nvidia.com>

clean up hyperball, revert some comments

8b137d0

Signed-off-by: mikail <mkhona@nvidia.com>

linting

3655f80

Signed-off-by: mikail <mkhona@nvidia.com>

mkhona-nvidia requested a review from skyw April 7, 2026 20:34

greptile-apps Bot reviewed Apr 7, 2026

View reviewed changes

skyw requested changes Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mkhona/hyperball fix#158

Mkhona/hyperball fix#158
mkhona-nvidia wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
mkhona-nvidia:mkhona/hyperball_fix

mkhona-nvidia commented Apr 7, 2026

Uh oh!

copy-pr-bot Bot commented Apr 7, 2026

Uh oh!

greptile-apps Bot commented Apr 7, 2026

Uh oh!

greptile-apps Bot Apr 7, 2026

Uh oh!

skyw Apr 7, 2026

Uh oh!

skyw left a comment

Uh oh!

skyw Apr 7, 2026

Uh oh!

skyw Apr 7, 2026

Uh oh!

skyw Apr 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mkhona-nvidia commented Apr 7, 2026

Uh oh!

copy-pr-bot Bot commented Apr 7, 2026

Uh oh!

greptile-apps Bot commented Apr 7, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

skyw Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

skyw left a comment

Choose a reason for hiding this comment

Uh oh!

skyw Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

skyw Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

skyw Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

skyw Apr 7, 2026 •

edited

Loading