Skip to content

Expose frequency and presence penalty in inference configs for use with ns eval#1383

Draft
shuoyangd wants to merge 1 commit intomainfrom
shuoyangd/expose_frequency_presence_penalty
Draft

Expose frequency and presence penalty in inference configs for use with ns eval#1383
shuoyangd wants to merge 1 commit intomainfrom
shuoyangd/expose_frequency_presence_penalty

Conversation

@shuoyangd
Copy link
Copy Markdown
Collaborator

@shuoyangd shuoyangd commented Apr 17, 2026

Those parameters are necessary for Qwen3.5 inference. #1329 added them for underlying functions that build vllm/megatron requests, but they are not exposed in the inference config for commands like ns eval. This PR addresses that.

Summary by CodeRabbit

  • New Features
    • Added frequency_penalty and presence_penalty parameters to inference configuration, enabling fine-tuned control over model token generation behavior. Users can now reduce repetitive outputs and improve response diversity when supported by the underlying model.

…th ns eval

Signed-off-by: Shuoyang Ding <shuoyangd@nvidia.com>
@shuoyangd shuoyangd requested review from Kipok and wasiahmad April 17, 2026 21:45
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

Two new inference control parameters, frequency_penalty and presence_penalty, are added to the inference configuration interface and model generation method. Both default to 0.0 and are propagated through the inference pipeline to the underlying LLM completion endpoints.

Changes

Cohort / File(s) Summary
Inference Configuration
nemo_skills/inference/generate.py
Added frequency_penalty and presence_penalty fields to InferenceConfig dataclass, each with default value of 0.0.
Model Generation Interface
nemo_skills/inference/model/base.py
Extended BaseModel.generate_async() method signature with frequency_penalty and presence_penalty parameters (both defaulting to 0.0), included in request kwargs passed to litellm completion endpoints.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: exposing frequency and presence penalty parameters in inference configs for use with ns eval, which matches the file modifications in InferenceConfig and BaseModel.generate_async().

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shuoyangd/expose_frequency_presence_penalty

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/inference/model/base.py`:
- Around line 249-250: The generate_async signature on BaseModel was extended to
include frequency_penalty and presence_penalty, but the override in
code_execution.py (the generate_async async method) lacks those params and
doesn't accept **kwargs; update the code_execution.py generate_async to include
frequency_penalty: float = 0.0 and presence_penalty: float = 0.0 in its
parameter list (or add **kwargs) and ensure those values are propagated into any
generation_params or forwarded calls used by that method (adjust the forwarding
in the generate_async implementation so generation_params receives
frequency_penalty and presence_penalty).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 87d87134-3406-4572-a5c1-620eaa8ef1db

📥 Commits

Reviewing files that changed from the base of the PR and between 019aa41 and 316bacb.

📒 Files selected for processing (2)
  • nemo_skills/inference/generate.py
  • nemo_skills/inference/model/base.py

Comment on lines +249 to +250
frequency_penalty: float = 0.0,
presence_penalty: float = 0.0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Update all generate_async overrides to match the new method contract.

Line 249-Line 250 expands BaseModel.generate_async(...), but nemo_skills/inference/model/code_execution.py:250-275 still defines an override without these args and no **kwargs. With generation_params now including both fields, that path will fail with TypeError: unexpected keyword argument ....

Run this read-only check to find overrides that still miss the new parameters:

#!/bin/bash
python - <<'PY'
import ast
from pathlib import Path

root = Path("nemo_skills/inference/model")
required = {"frequency_penalty", "presence_penalty"}

for path in sorted(root.rglob("*.py")):
    try:
        tree = ast.parse(path.read_text(encoding="utf-8"))
    except Exception as e:
        print(f"[parse-error] {path}: {e}")
        continue

    for node in ast.walk(tree):
        if isinstance(node, ast.AsyncFunctionDef) and node.name == "generate_async":
            params = [a.arg for a in node.args.args]
            has_var_kwargs = node.args.kwarg is not None
            missing = sorted(required - set(params))
            if missing and not has_var_kwargs:
                print(f"{path}:{node.lineno} missing={missing} no_**kwargs")
PY
Suggested fix direction
# In nemo_skills/inference/model/code_execution.py generate_async signature
 async def generate_async(
     self,
     prompt: str | list[dict],
@@
     repetition_penalty: float = 1.0,
+    frequency_penalty: float = 0.0,
+    presence_penalty: float = 0.0,
     random_seed: int = 0,
@@
 ) -> list[dict]:

Also propagate both values wherever that override forwards generation args to model calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/inference/model/base.py` around lines 249 - 250, The
generate_async signature on BaseModel was extended to include frequency_penalty
and presence_penalty, but the override in code_execution.py (the generate_async
async method) lacks those params and doesn't accept **kwargs; update the
code_execution.py generate_async to include frequency_penalty: float = 0.0 and
presence_penalty: float = 0.0 in its parameter list (or add **kwargs) and ensure
those values are propagated into any generation_params or forwarded calls used
by that method (adjust the forwarding in the generate_async implementation so
generation_params receives frequency_penalty and presence_penalty).

@shuoyangd shuoyangd marked this pull request as draft April 17, 2026 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant