Expose frequency and presence penalty in inference configs for use with ns eval#1383
Expose frequency and presence penalty in inference configs for use with ns eval#1383
Conversation
…th ns eval Signed-off-by: Shuoyang Ding <shuoyangd@nvidia.com>
📝 WalkthroughWalkthroughTwo new inference control parameters, Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/inference/model/base.py`:
- Around line 249-250: The generate_async signature on BaseModel was extended to
include frequency_penalty and presence_penalty, but the override in
code_execution.py (the generate_async async method) lacks those params and
doesn't accept **kwargs; update the code_execution.py generate_async to include
frequency_penalty: float = 0.0 and presence_penalty: float = 0.0 in its
parameter list (or add **kwargs) and ensure those values are propagated into any
generation_params or forwarded calls used by that method (adjust the forwarding
in the generate_async implementation so generation_params receives
frequency_penalty and presence_penalty).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 87d87134-3406-4572-a5c1-620eaa8ef1db
📒 Files selected for processing (2)
nemo_skills/inference/generate.pynemo_skills/inference/model/base.py
| frequency_penalty: float = 0.0, | ||
| presence_penalty: float = 0.0, |
There was a problem hiding this comment.
Update all generate_async overrides to match the new method contract.
Line 249-Line 250 expands BaseModel.generate_async(...), but nemo_skills/inference/model/code_execution.py:250-275 still defines an override without these args and no **kwargs. With generation_params now including both fields, that path will fail with TypeError: unexpected keyword argument ....
Run this read-only check to find overrides that still miss the new parameters:
#!/bin/bash
python - <<'PY'
import ast
from pathlib import Path
root = Path("nemo_skills/inference/model")
required = {"frequency_penalty", "presence_penalty"}
for path in sorted(root.rglob("*.py")):
try:
tree = ast.parse(path.read_text(encoding="utf-8"))
except Exception as e:
print(f"[parse-error] {path}: {e}")
continue
for node in ast.walk(tree):
if isinstance(node, ast.AsyncFunctionDef) and node.name == "generate_async":
params = [a.arg for a in node.args.args]
has_var_kwargs = node.args.kwarg is not None
missing = sorted(required - set(params))
if missing and not has_var_kwargs:
print(f"{path}:{node.lineno} missing={missing} no_**kwargs")
PYSuggested fix direction
# In nemo_skills/inference/model/code_execution.py generate_async signature
async def generate_async(
self,
prompt: str | list[dict],
@@
repetition_penalty: float = 1.0,
+ frequency_penalty: float = 0.0,
+ presence_penalty: float = 0.0,
random_seed: int = 0,
@@
) -> list[dict]:Also propagate both values wherever that override forwards generation args to model calls.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/inference/model/base.py` around lines 249 - 250, The
generate_async signature on BaseModel was extended to include frequency_penalty
and presence_penalty, but the override in code_execution.py (the generate_async
async method) lacks those params and doesn't accept **kwargs; update the
code_execution.py generate_async to include frequency_penalty: float = 0.0 and
presence_penalty: float = 0.0 in its parameter list (or add **kwargs) and ensure
those values are propagated into any generation_params or forwarded calls used
by that method (adjust the forwarding in the generate_async implementation so
generation_params receives frequency_penalty and presence_penalty).
Those parameters are necessary for Qwen3.5 inference. #1329 added them for underlying functions that build vllm/megatron requests, but they are not exposed in the inference config for commands like
ns eval. This PR addresses that.Summary by CodeRabbit
frequency_penaltyandpresence_penaltyparameters to inference configuration, enabling fine-tuned control over model token generation behavior. Users can now reduce repetitive outputs and improve response diversity when supported by the underlying model.