SALM Backend Support and HF ASR Leaderboard Evaluation Recipe#1344
SALM Backend Support and HF ASR Leaderboard Evaluation Recipe#1344
Conversation
…en-2.5b) Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
51e453a to
0f4a275
Compare
There was a problem hiding this comment.
Pull request overview
Adds SALM (Speech-Augmented Language Model) support to the unified inference server and introduces a one-command HuggingFace Open ASR Leaderboard evaluation recipe, alongside robustness improvements to ASR leaderboard dataset preparation and a WER-inflating bug fix in the NeMo ASR backend.
Changes:
- Added a new
salmunified-server backend for NeMo SALM models using the chat-stylegenerate()API. - Fixed
nemo_asrhypothesis parsing so empty transcriptions don’t fall back to verbose object string representations. - Added
run_hf_leaderboard.pyand updated ASR leaderboard dataset preparation for improved maintainability/robustness.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| recipes/multimodal/server/backends/salm_backend.py | New SALM backend implementing audio-to-text via SALM generate() with temp audio files. |
| recipes/multimodal/server/backends/nemo_asr_backend.py | Fixes hypothesis parsing to avoid returning a large str(hyp) when transcript text is empty. |
| recipes/multimodal/server/backends/init.py | Registers the new salm backend for lazy loading. |
| recipes/asr/run_hf_leaderboard.py | Adds a recipe script to run HF Open ASR Leaderboard evals against the unified server. |
| nemo_skills/dataset/asr-leaderboard/prepare.py | Refactors dataset download/formatting and audio handling for leaderboard prep. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughSplit ASR dataset preparation into separate audio extraction and JSONL formatting, removed duration filtering, added a HuggingFace ASR leaderboard runner, introduced and registered a new SALM inference backend (config, model load/warmup, validation, generation), and tightened nemo_asr hypothesis text selection logic. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SALMBackend
participant AudioFS as Audio I/O
participant SALMModel as SALM Model
participant Tokenizer
Client->>SALMBackend: generate(requests)
SALMBackend->>SALMBackend: validate_request(requests)
alt validation passes
SALMBackend->>AudioFS: write audio_bytes -> temp .wav files
SALMBackend->>SALMBackend: build SALM chat prompts (include audio_locator_tag)
SALMBackend->>SALMModel: generate(prompts, max_new_tokens)
SALMModel-->>SALMBackend: token sequences per input
SALMBackend->>Tokenizer: decode tokens -> text
Tokenizer-->>SALMBackend: decoded text
SALMBackend->>SALMBackend: assemble GenerationResult per request (text, tokens, timing, metadata)
else validation fails
SALMBackend->>SALMBackend: produce per-request error GenerationResult
end
SALMBackend->>AudioFS: cleanup temp files
SALMBackend-->>Client: List[GenerationResult]
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
recipes/asr/run_hf_leaderboard.py (1)
41-41: Shadowingevalbuiltin is a library API design issue.The import
from nemo_skills.pipeline.cli import evalshadows Python's built-ineval(). This is flagged by static analysis, but sincenemo_skills.pipeline.cliexports this name, the issue is upstream in the library's API design rather than in this script. Consider aliasing if the built-in is needed locally:from nemo_skills.pipeline.cli import eval as run_eval, wrap_arguments🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/asr/run_hf_leaderboard.py` at line 41, The import shadows Python's builtin eval; change the import line that brings in nemo_skills.pipeline.cli.eval to use an alias (e.g., import eval as run_eval) and update all usages of eval in this module to call run_eval instead; keep wrap_arguments as-is and ensure any references to the original eval symbol (function or variable) are replaced with the new alias to avoid builtin shadowing.recipes/multimodal/server/backends/salm_backend.py (2)
132-147: Consider extracting shared audio validation to base class.
_get_request_audio_bytesandvalidate_requestare identical toNeMoASRBackend. This duplication could be extracted to a shared mixin or utility inbase.pyfor audio-input backends.♻️ Potential shared utility in base.py
# In base.py, add a mixin or utility: class AudioInputMixin: """Shared utilities for backends accepting audio input.""" def _get_request_audio_bytes(self, request: GenerationRequest) -> bytes: if request.audio_bytes: return request.audio_bytes if request.audio_bytes_list: if len(request.audio_bytes_list) > 1: raise ValueError(f"{self.name} backend currently supports one audio input per request.") return request.audio_bytes_list[0] raise ValueError("Request must contain audio_bytes/audio_bytes_list") def validate_request(self, request: GenerationRequest) -> Optional[str]: has_audio = request.audio_bytes is not None or ( request.audio_bytes_list is not None and len(request.audio_bytes_list) > 0 ) if not has_audio: return f"{self.name} backend requires audio input" return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/salm_backend.py` around lines 132 - 147, Extract the duplicated audio-request logic into a shared mixin in base.py (e.g., AudioInputMixin) and have both SALM backend and NeMoASRBackend inherit/use it: move the implementations of _get_request_audio_bytes and validate_request into AudioInputMixin (using self.name for backend-specific messages) and replace the local methods in SALM backend (methods named _get_request_audio_bytes and validate_request) so they defer to the mixin implementation; ensure the mixin raises the same errors and preserves the single-audio constraint and message text.
84-104: Add explicit device mapping to SALM model loading.NeMoASRBackend passes
map_location=torch.device(...)torestore_from()andfrom_pretrained()to control device placement during model loading, while this implementation relies on.to(device)afterward. This may cause temporary memory spikes on the default device before the explicit device transfer. Consider following the same pattern as NeMoASRBackend:map_location = torch.device(self.config.device) if Path(model_ref).exists(): self._model = SALM.restore_from(model_ref, map_location=map_location) else: self._model = SALM.from_pretrained(model_ref, map_location=map_location)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/salm_backend.py` around lines 84 - 104, The load_model method currently moves the SALM instance to the target device only after loading, which can cause memory spikes; change load_model to construct a torch.device from self.config.device and pass it as map_location to SALM.restore_from and SALM.from_pretrained (i.e., call SALM.restore_from(model_ref, map_location=map_location) or SALM.from_pretrained(model_ref, map_location=map_location)) so the model is allocated directly on the correct device; ensure torch is imported if not already and keep the existing .to(self.config.device) and eval() calls as a safety net.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/dataset/asr-leaderboard/prepare.py`:
- Around line 115-118: The load_dataset call for HF_REPO using variables
hf_config and hf_split must enable remote code execution for custom dataset
scripts; update the load_dataset invocation (the one assigning to dataset) to
pass trust_remote_code=True so datasets from HF_REPO (e.g., hf_config/hf_split)
that require custom loaders load correctly while keeping the surrounding logic
that casts the "audio" column (dataset.cast_column and Audio) unchanged.
In `@recipes/multimodal/server/backends/nemo_asr_backend.py`:
- Around line 236-240: The dict-hypothesis path in
NeMoASRBackend._parse_single_hypothesis lacks test coverage for missing/empty
"text" (it checks "transcript" as a fallback) while the object path is tested;
add a unit test to cover the dict branch that passes a dict like {"text": "",
"words": None} and assert text == "" and words == [] to ensure the same fallback
behavior; create test_nemo_asr_backend_empty_text_dict_hypothesis using
NeMoASRBackend(NeMoASRConfig(model_path="dummy")) and call
_parse_single_hypothesis with the dict to validate the result.
---
Nitpick comments:
In `@recipes/asr/run_hf_leaderboard.py`:
- Line 41: The import shadows Python's builtin eval; change the import line that
brings in nemo_skills.pipeline.cli.eval to use an alias (e.g., import eval as
run_eval) and update all usages of eval in this module to call run_eval instead;
keep wrap_arguments as-is and ensure any references to the original eval symbol
(function or variable) are replaced with the new alias to avoid builtin
shadowing.
In `@recipes/multimodal/server/backends/salm_backend.py`:
- Around line 132-147: Extract the duplicated audio-request logic into a shared
mixin in base.py (e.g., AudioInputMixin) and have both SALM backend and
NeMoASRBackend inherit/use it: move the implementations of
_get_request_audio_bytes and validate_request into AudioInputMixin (using
self.name for backend-specific messages) and replace the local methods in SALM
backend (methods named _get_request_audio_bytes and validate_request) so they
defer to the mixin implementation; ensure the mixin raises the same errors and
preserves the single-audio constraint and message text.
- Around line 84-104: The load_model method currently moves the SALM instance to
the target device only after loading, which can cause memory spikes; change
load_model to construct a torch.device from self.config.device and pass it as
map_location to SALM.restore_from and SALM.from_pretrained (i.e., call
SALM.restore_from(model_ref, map_location=map_location) or
SALM.from_pretrained(model_ref, map_location=map_location)) so the model is
allocated directly on the correct device; ensure torch is imported if not
already and keep the existing .to(self.config.device) and eval() calls as a
safety net.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 71b193fa-8ba1-420b-bec7-f19ae7c99b64
📒 Files selected for processing (5)
nemo_skills/dataset/asr-leaderboard/prepare.pyrecipes/asr/run_hf_leaderboard.pyrecipes/multimodal/server/backends/__init__.pyrecipes/multimodal/server/backends/nemo_asr_backend.pyrecipes/multimodal/server/backends/salm_backend.py
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
nemo_skills/dataset/asr-leaderboard/prepare.py (1)
115-118:⚠️ Potential issue | 🟠 MajorAdd
trust_remote_code=Truetoload_datasetcall.The
hf-audio/esb-datasets-test-only-sortedrepository uses custom dataset loading scripts and requirestrust_remote_code=True. Without this flag, dataset loading will fail with recent versions of thedatasetslibrary.🔧 Proposed fix
print(f"Loading {dataset_name} from {HF_REPO} (config={hf_config}, split={hf_split})...") - dataset = load_dataset(HF_REPO, hf_config, split=hf_split) + dataset = load_dataset(HF_REPO, hf_config, split=hf_split, trust_remote_code=True) if with_audio and "audio" in dataset.column_names: dataset = dataset.cast_column("audio", Audio(sampling_rate=AUDIO_SAMPLE_RATE))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/dataset/asr-leaderboard/prepare.py` around lines 115 - 118, The load_dataset call in prepare.py fails for repos with custom dataset scripts; update the call to include trust_remote_code=True so remote code is allowed when loading HF_REPO (the call that uses load_dataset(HF_REPO, hf_config, split=hf_split)); keep the rest of the logic (the with_audio check and dataset.cast_column("audio", Audio(sampling_rate=AUDIO_SAMPLE_RATE))) unchanged.
🧹 Nitpick comments (2)
nemo_skills/dataset/asr-leaderboard/prepare.py (1)
56-69: Docstring claims broader handling than implementation provides.The docstring states it handles "the newer AudioDecoder object from torchcodec-based datasets library," but the implementation only tries dict-style access. If AudioDecoder uses a different API, this silently returns
(None, None)rather than actually extracting the audio. Additionally,IndexErroris unusual for dict/object access patterns.Consider either updating the docstring to reflect current behavior, or adding explicit AudioDecoder handling if that format is expected.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/dataset/asr-leaderboard/prepare.py` around lines 56 - 69, The docstring for extract_audio claims support for a torchcodec AudioDecoder but the implementation only handles dict-style access and swallows mismatches; update the function to robustly handle both dicts and AudioDecoder-like objects (or else update the docstring to remove that claim). Concretely, in extract_audio detect input type: if isinstance(audio_info, dict) use audio_info["array"] and ["sampling_rate"]; otherwise check for attributes or methods commonly exposed by AudioDecoder (e.g., hasattr(audio_info, "array") and hasattr(audio_info, "sampling_rate"), or a to_numpy()/to_array()/numpy() method), retrieve the array and sampling rate with getattr/call as needed, convert the array to np.array and sampling_rate to int, and only catch appropriate exceptions (AttributeError, TypeError, ValueError) instead of IndexError; if unsupported, return (None, None) as before. Ensure the docstring matches the actual supported formats and reference extract_audio in your changes.recipes/asr/run_hf_leaderboard.py (1)
70-70: Preserve backend-specific flag passthrough here.
serve_unifiedalready supports backend-specific CLI args, but this wrapper callsparse_args()and only forwards--backend/--batch_size. That makes SALM-only knobs like--user_promptunavailable from this recipe and forces future backend additions back into this file.parse_known_args()plus appending the unknown args toserver_argswould keep the recipe backend-agnostic. Based on learnings, innemo_skills/inference/server/serve_unified.py, the CLI entrypoint usesparse_known_args()intentionally to capture backend-specific arguments inextra_args, which are then processed byparse_extra_args(). This backend-agnostic design allows different backends to have their own CLI flags without hardcoding them in the entrypoint.Also applies to: 92-92
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/asr/run_hf_leaderboard.py` at line 70, Replace parser.parse_args() with parser.parse_known_args() to capture backend-specific flags: call args, extra = parser.parse_known_args() then keep using args.backend/args.batch_size but append the extra list to the server_args you build for invoking serve_unified (e.g., server_args += extra). This preserves SALM-specific flags like --user_prompt and any future backend flags so serve_unified (and its parse_extra_args) receives unknown/backend-specific arguments without hardcoding them in this recipe; update all other parse_args() occurrences similarly (e.g., the later call around line 92) to parse_known_args and append unknowns to server_args.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/asr/run_hf_leaderboard.py`:
- Line 41: The import of eval from nemo_skills.pipeline.cli shadows Python's
builtin eval; change the import to alias it (e.g., from nemo_skills.pipeline.cli
import eval as pipeline_eval, wrap_arguments) and update all call sites that
currently call eval (notably the call at the former line 72) to use the aliased
name pipeline_eval so you invoke the pipeline entrypoint instead of the builtin.
In `@recipes/multimodal/server/backends/salm_backend.py`:
- Around line 173-176: The code incorrectly reads user_prompt and max_new_tokens
from only the first valid request (via first_extra) and applies them to the
whole batch; update the logic in the block that builds per-request inputs
(referencing requests, valid_indices, first_extra, user_prompt, max_new_tokens,
and GenerationRequest.max_new_tokens) so you compute user_prompt and
max_new_tokens per request by reading each request.extra_params (falling back to
self.salm_config.user_prompt and self.config.max_new_tokens respectively) when
iterating valid_indices; ensure any unsupported extra_params raise or are
ignored per guidelines so per-request overrides are honored rather than using
the first request for the entire batch.
- Around line 127-128: The current broad except block in the SALM warmup (the
except Exception as e logging "SALM warmup skipped due to error: %s") hides
startup failures; remove the blanket catch or change it to re-raise after
logging so warmup errors fail startup. Locate the warmup call in salm_backend.py
(the try/except that logs via logger.warning) and either delete the try/except
so exceptions propagate, or keep the log but immediately raise the exception
(raise) after logging to ensure startup fails on unexpected warmup errors.
- Around line 141-147: The validate_request method currently only checks for
presence of audio but allows audio_bytes_list of any length; update
validate_request(GenerationRequest) to enforce the single-audio constraint by
returning an error when request.audio_bytes_list is not None and
len(request.audio_bytes_list) != 1 (e.g., "SALM backend requires exactly one
audio input"), so invalid multi-audio requests fail early (before
batching/temp-file handling) instead of deferring to _get_request_audio_bytes().
---
Duplicate comments:
In `@nemo_skills/dataset/asr-leaderboard/prepare.py`:
- Around line 115-118: The load_dataset call in prepare.py fails for repos with
custom dataset scripts; update the call to include trust_remote_code=True so
remote code is allowed when loading HF_REPO (the call that uses
load_dataset(HF_REPO, hf_config, split=hf_split)); keep the rest of the logic
(the with_audio check and dataset.cast_column("audio",
Audio(sampling_rate=AUDIO_SAMPLE_RATE))) unchanged.
---
Nitpick comments:
In `@nemo_skills/dataset/asr-leaderboard/prepare.py`:
- Around line 56-69: The docstring for extract_audio claims support for a
torchcodec AudioDecoder but the implementation only handles dict-style access
and swallows mismatches; update the function to robustly handle both dicts and
AudioDecoder-like objects (or else update the docstring to remove that claim).
Concretely, in extract_audio detect input type: if isinstance(audio_info, dict)
use audio_info["array"] and ["sampling_rate"]; otherwise check for attributes or
methods commonly exposed by AudioDecoder (e.g., hasattr(audio_info, "array") and
hasattr(audio_info, "sampling_rate"), or a to_numpy()/to_array()/numpy()
method), retrieve the array and sampling rate with getattr/call as needed,
convert the array to np.array and sampling_rate to int, and only catch
appropriate exceptions (AttributeError, TypeError, ValueError) instead of
IndexError; if unsupported, return (None, None) as before. Ensure the docstring
matches the actual supported formats and reference extract_audio in your
changes.
In `@recipes/asr/run_hf_leaderboard.py`:
- Line 70: Replace parser.parse_args() with parser.parse_known_args() to capture
backend-specific flags: call args, extra = parser.parse_known_args() then keep
using args.backend/args.batch_size but append the extra list to the server_args
you build for invoking serve_unified (e.g., server_args += extra). This
preserves SALM-specific flags like --user_prompt and any future backend flags so
serve_unified (and its parse_extra_args) receives unknown/backend-specific
arguments without hardcoding them in this recipe; update all other parse_args()
occurrences similarly (e.g., the later call around line 92) to parse_known_args
and append unknowns to server_args.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b8efa982-7144-4d7c-bec5-f4380406e644
📒 Files selected for processing (5)
nemo_skills/dataset/asr-leaderboard/prepare.pyrecipes/asr/run_hf_leaderboard.pyrecipes/multimodal/server/backends/__init__.pyrecipes/multimodal/server/backends/nemo_asr_backend.pyrecipes/multimodal/server/backends/salm_backend.py
🚧 Files skipped from review as they are similar to previous changes (2)
- recipes/multimodal/server/backends/init.py
- recipes/multimodal/server/backends/nemo_asr_backend.py
|
Thank you for the PR! Minor checks. Also could you please add slurm test same as for asr/tts backends? You can use same audio as ASR backend test. |
Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
62273ab to
6f4f93b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
recipes/multimodal/server/backends/salm_backend.py (1)
173-191:⚠️ Potential issue | 🟠 MajorDon't let the first request define the whole batch.
user_promptandmax_new_tokensare still taken from the first valid request and then reused for every prompt in the batch. This also bypasses the typedGenerationRequest.user_promptfield and silently accepts the other generation overrides onGenerationRequest/BackendConfigwithout applying them. Please either batch only requests with identical settings or reject unsupported per-request overrides up front.As per coding guidelines, "Avoid cases where user-passed parameters are unused; code should fail if user specifies an unsupported argument or if a required argument is missing. Use dataclass or **kwargs syntax to handle this automatically."
Also applies to: 208-212
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/salm_backend.py` around lines 173 - 191, The current batch uses values from first_req (first_req, user_prompt, max_new_tokens) for every prompt, silently ignoring per-request GenerationRequest.user_prompt and per-request generation overrides; update the handler to either (A) validate that every request in valid_indices has identical relevant settings (user_prompt, max_new_tokens and any other generation overrides referenced in the backend) before building prompts and if any mismatch exists raise/return an error that per-request overrides are unsupported, or (B) group requests into sub-batches keyed by their identical settings and call self._model.generate separately per group; ensure both the prompt construction code that builds prompts from temp_paths and the other similar block (around the other occurrence noted at lines 208-212) use the same validation/grouping logic so no request's overrides are ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@recipes/multimodal/server/backends/salm_backend.py`:
- Around line 130-146: validate_request currently allows both
GenerationRequest.audio_bytes and audio_bytes_list to be set and
_get_request_audio_bytes silently prefers audio_bytes; change validation to
reject ambiguous requests by returning an error when both audio_bytes is not
None and audio_bytes_list is not None and len(audio_bytes_list) > 0, e.g. in
validate_request check for the simultaneous presence of audio_bytes and a
non-empty audio_bytes_list and return a clear error like "Specify either
audio_bytes or audio_bytes_list, not both"; then keep _get_request_audio_bytes
as-is (it can assume validate_request enforced exclusivity) or update it to
raise if both are present as a defensive fallback.
- Around line 163-170: The current try/except around calling
_get_request_audio_bytes and writing temp files in the Salm backend is too broad
and hides unexpected failures; change the handler in the block that builds
temp_paths/valid_indices (and the similar block at the other site around lines
216-217) to only catch explicit request-validation errors (e.g., ValueError or
your framework's RequestValidationError) and convert those into
GenerationResult(error=...), while letting other exceptions (IOError,
tokenizer/model errors, etc.) propagate after optionally logging; reference the
_get_request_audio_bytes call, the temp_paths and valid_indices updates, and the
results[idx] = GenerationResult(...) assignment to locate and narrow the
exception handling.
---
Duplicate comments:
In `@recipes/multimodal/server/backends/salm_backend.py`:
- Around line 173-191: The current batch uses values from first_req (first_req,
user_prompt, max_new_tokens) for every prompt, silently ignoring per-request
GenerationRequest.user_prompt and per-request generation overrides; update the
handler to either (A) validate that every request in valid_indices has identical
relevant settings (user_prompt, max_new_tokens and any other generation
overrides referenced in the backend) before building prompts and if any mismatch
exists raise/return an error that per-request overrides are unsupported, or (B)
group requests into sub-batches keyed by their identical settings and call
self._model.generate separately per group; ensure both the prompt construction
code that builds prompts from temp_paths and the other similar block (around the
other occurrence noted at lines 208-212) use the same validation/grouping logic
so no request's overrides are ignored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f403486b-770f-4011-9f2e-6bb5651d541d
📒 Files selected for processing (3)
recipes/asr/run_hf_leaderboard.pyrecipes/multimodal/server/backends/nemo_asr_backend.pyrecipes/multimodal/server/backends/salm_backend.py
🚧 Files skipped from review as they are similar to previous changes (2)
- recipes/multimodal/server/backends/nemo_asr_backend.py
- recipes/asr/run_hf_leaderboard.py
| def _get_request_audio_bytes(self, request: GenerationRequest) -> bytes: | ||
| if request.audio_bytes: | ||
| return request.audio_bytes | ||
| if request.audio_bytes_list: | ||
| if len(request.audio_bytes_list) > 1: | ||
| raise ValueError("SALM backend currently supports one audio input per request.") | ||
| return request.audio_bytes_list[0] | ||
| raise ValueError("Request must contain audio_bytes/audio_bytes_list") | ||
|
|
||
| def validate_request(self, request: GenerationRequest) -> Optional[str]: | ||
| has_audio = request.audio_bytes is not None or ( | ||
| request.audio_bytes_list is not None and len(request.audio_bytes_list) > 0 | ||
| ) | ||
| if not has_audio: | ||
| return "SALM backend requires audio input" | ||
| if request.audio_bytes_list is not None and len(request.audio_bytes_list) > 1: | ||
| return "SALM backend currently supports one audio input per request" |
There was a problem hiding this comment.
Reject ambiguous requests that set both audio inputs.
validate_request() currently allows audio_bytes and audio_bytes_list together, and _get_request_audio_bytes() then silently prefers audio_bytes. That means one user-supplied audio payload is ignored without any error.
💡 Proposed fix
def _get_request_audio_bytes(self, request: GenerationRequest) -> bytes:
+ if request.audio_bytes is not None and request.audio_bytes_list is not None:
+ raise ValueError("SALM backend accepts either audio_bytes or audio_bytes_list, not both")
if request.audio_bytes:
return request.audio_bytes
if request.audio_bytes_list:
if len(request.audio_bytes_list) > 1:
raise ValueError("SALM backend currently supports one audio input per request.")
return request.audio_bytes_list[0]
raise ValueError("Request must contain audio_bytes/audio_bytes_list")
def validate_request(self, request: GenerationRequest) -> Optional[str]:
+ if request.audio_bytes is not None and request.audio_bytes_list is not None:
+ return "SALM backend accepts either audio_bytes or audio_bytes_list, not both"
has_audio = request.audio_bytes is not None or (
request.audio_bytes_list is not None and len(request.audio_bytes_list) > 0
)As per coding guidelines, "Avoid cases where user-passed parameters are unused; code should fail if user specifies an unsupported argument or if a required argument is missing. Use dataclass or **kwargs syntax to handle this automatically."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@recipes/multimodal/server/backends/salm_backend.py` around lines 130 - 146,
validate_request currently allows both GenerationRequest.audio_bytes and
audio_bytes_list to be set and _get_request_audio_bytes silently prefers
audio_bytes; change validation to reject ambiguous requests by returning an
error when both audio_bytes is not None and audio_bytes_list is not None and
len(audio_bytes_list) > 0, e.g. in validate_request check for the simultaneous
presence of audio_bytes and a non-empty audio_bytes_list and return a clear
error like "Specify either audio_bytes or audio_bytes_list, not both"; then keep
_get_request_audio_bytes as-is (it can assume validate_request enforced
exclusivity) or update it to raise if both are present as a defensive fallback.
| try: | ||
| audio_bytes = self._get_request_audio_bytes(req) | ||
| p = tmp_dir / f"req_{idx:04d}.wav" | ||
| p.write_bytes(audio_bytes) | ||
| temp_paths.append(str(p)) | ||
| valid_indices.append(idx) | ||
| except Exception as e: | ||
| results[idx] = GenerationResult(error=str(e), request_id=req.request_id) |
There was a problem hiding this comment.
Only catch expected request-validation failures here.
These blanket handlers also swallow temp-file, model, and tokenizer failures, turning backend regressions into plain request errors with no traceback. Catch the expected validation errors explicitly and let unexpected exceptions propagate after logging.
Based on learnings, "Don't catch exceptions when they are not expected to be normally raised; let the code fail when something unexpected happens so users notice it instead of silently misbehaving."
Also applies to: 216-217
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 169-169: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@recipes/multimodal/server/backends/salm_backend.py` around lines 163 - 170,
The current try/except around calling _get_request_audio_bytes and writing temp
files in the Salm backend is too broad and hides unexpected failures; change the
handler in the block that builds temp_paths/valid_indices (and the similar block
at the other site around lines 216-217) to only catch explicit
request-validation errors (e.g., ValueError or your framework's
RequestValidationError) and convert those into GenerationResult(error=...),
while letting other exceptions (IOError, tokenizer/model errors, etc.) propagate
after optionally logging; reference the _get_request_audio_bytes call, the
temp_paths and valid_indices updates, and the results[idx] =
GenerationResult(...) assignment to locate and narrow the exception handling.
There was a problem hiding this comment.
♻️ Duplicate comments (4)
recipes/multimodal/server/backends/salm_backend.py (4)
130-147:⚠️ Potential issue | 🟠 MajorReject requests that set both audio fields.
validate_request()still accepts a request with bothaudio_bytesandaudio_bytes_list, and_get_request_audio_bytes()silently prefers one payload. That means caller-supplied input is ignored without an error. As per coding guidelines, "Avoid cases where user-passed parameters are unused; code should fail if user specifies an unsupported argument or if a required argument is missing. Use dataclass or **kwargs syntax to handle this automatically."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/salm_backend.py` around lines 130 - 147, The validator currently allows requests that set both audio_bytes and audio_bytes_list while _get_request_audio_bytes silently prefers one; update validate_request to detect and reject requests where both request.audio_bytes is not None and request.audio_bytes_list is not None (return a clear error like "Specify either audio_bytes or audio_bytes_list, not both") and also harden _get_request_audio_bytes to raise the same ValueError if both are present (so callers cannot have input ignored); reference validate_request and _get_request_audio_bytes to locate and change the logic and error messages accordingly.
163-170:⚠️ Potential issue | 🟠 MajorOnly catch expected validation errors in
generate().These blanket handlers also swallow temp-file, filesystem, model, and tokenizer failures, turning backend regressions into plain request errors with no traceback. If you want per-request validation errors here, catch
ValueErrorfrom_get_request_audio_bytes()and let the rest propagate. As per coding guidelines, "Don't catch exceptions when they are not expected to be normally raised; let the code fail when something unexpected happens so users notice it instead of silently misbehaving."Also applies to: 216-217
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/salm_backend.py` around lines 163 - 170, In generate(), only catch validation errors from _get_request_audio_bytes() instead of blanket Exception: change the try/except that populates temp_paths, valid_indices and sets results[idx] = GenerationResult(...) to catch ValueError (not Exception) so filesystem/model/tokenizer errors propagate; apply the same change to the second analogous try/except handling request audio bytes (the other block around setting results entries) so unexpected errors are not swallowed.
173-176:⚠️ Potential issue | 🟠 MajorDon't let the first valid request define the whole batch.
user_promptandmax_new_tokensare read once fromvalid_indices[0]and then reused for every prompt. Later requests lose their own overrides, andGenerationRequest.user_promptis never consulted here either. Either build prompts from each request or reject mixed-setting batches explicitly. As per coding guidelines, "Avoid cases where user-passed parameters are unused; code should fail if user specifies an unsupported argument or if a required argument is missing. Use dataclass or **kwargs syntax to handle this automatically."Also applies to: 179-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/salm_backend.py` around lines 173 - 176, The code currently reads user_prompt and max_new_tokens once from first_req (variables first_req, first_extra, user_prompt, max_new_tokens) and reuses them for the whole batch, which ignores per-request overrides (GenerationRequest.user_prompt and per-request max_new_tokens) and can silently drop user-provided settings; change the batching logic so you either (A) build the prompt and token limit per-request inside the loop that processes valid_indices (use each request's .extra_params / .user_prompt and .max_new_tokens) or (B) explicitly reject mixed-setting batches by validating that all requests in valid_indices have identical user_prompt and max_new_tokens and raise an error if not; update any downstream use sites that reference those shared variables to consume per-request values or the validated common values accordingly.
123-126:⚠️ Potential issue | 🟠 MajorLet warmup failures fail startup.
Warmup exercises the same
generate()path used for real traffic. Swallowing every exception here can leave the backend marked loaded even when generation is broken. As per coding guidelines, "Don't catch exceptions when they are not expected to be normally raised; let the code fail when something unexpected happens so users notice it instead of silently misbehaving."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@recipes/multimodal/server/backends/salm_backend.py` around lines 123 - 126, The warmup currently swallows all exceptions around self._model.generate and logs a warning via logger.warning, which can mask real startup failures; remove the broad except or change it to only catch expected, recoverable exceptions and re-raise any unexpected ones so startup fails on generation errors — locate the SALM warmup block that calls self._model.generate(...) and the logger.warning("SALM warmup skipped due to error: %s", e) and either delete the try/except so exceptions propagate or replace the except with specific exception types and a re-raise for anything else.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@recipes/multimodal/server/backends/salm_backend.py`:
- Around line 130-147: The validator currently allows requests that set both
audio_bytes and audio_bytes_list while _get_request_audio_bytes silently prefers
one; update validate_request to detect and reject requests where both
request.audio_bytes is not None and request.audio_bytes_list is not None (return
a clear error like "Specify either audio_bytes or audio_bytes_list, not both")
and also harden _get_request_audio_bytes to raise the same ValueError if both
are present (so callers cannot have input ignored); reference validate_request
and _get_request_audio_bytes to locate and change the logic and error messages
accordingly.
- Around line 163-170: In generate(), only catch validation errors from
_get_request_audio_bytes() instead of blanket Exception: change the try/except
that populates temp_paths, valid_indices and sets results[idx] =
GenerationResult(...) to catch ValueError (not Exception) so
filesystem/model/tokenizer errors propagate; apply the same change to the second
analogous try/except handling request audio bytes (the other block around
setting results entries) so unexpected errors are not swallowed.
- Around line 173-176: The code currently reads user_prompt and max_new_tokens
once from first_req (variables first_req, first_extra, user_prompt,
max_new_tokens) and reuses them for the whole batch, which ignores per-request
overrides (GenerationRequest.user_prompt and per-request max_new_tokens) and can
silently drop user-provided settings; change the batching logic so you either
(A) build the prompt and token limit per-request inside the loop that processes
valid_indices (use each request's .extra_params / .user_prompt and
.max_new_tokens) or (B) explicitly reject mixed-setting batches by validating
that all requests in valid_indices have identical user_prompt and max_new_tokens
and raise an error if not; update any downstream use sites that reference those
shared variables to consume per-request values or the validated common values
accordingly.
- Around line 123-126: The warmup currently swallows all exceptions around
self._model.generate and logs a warning via logger.warning, which can mask real
startup failures; remove the broad except or change it to only catch expected,
recoverable exceptions and re-raise any unexpected ones so startup fails on
generation errors — locate the SALM warmup block that calls
self._model.generate(...) and the logger.warning("SALM warmup skipped due to
error: %s", e) and either delete the try/except so exceptions propagate or
replace the except with specific exception types and a re-raise for anything
else.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 90d22a12-dfc9-41e9-bf91-547dec1b878c
📒 Files selected for processing (3)
recipes/asr/run_hf_leaderboard.pyrecipes/multimodal/server/backends/nemo_asr_backend.pyrecipes/multimodal/server/backends/salm_backend.py
🚧 Files skipped from review as they are similar to previous changes (1)
- recipes/multimodal/server/backends/nemo_asr_backend.py
Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/slurm-tests/unified_salm/check_results.py (1)
88-94: Use direct key access for required output fields.If
idandgenerationare required output schema fields, prefer explicit key checks +row["..."]instead of.get(...)to avoid silent fallback behavior.Proposed fix
- sample_id = row.get("id") + if "id" not in row: + soft_assert(False, "Missing required key 'id' in output row") + continue + sample_id = row["id"] @@ - transcript = (row.get("generation") or "").strip() + if "generation" not in row: + soft_assert(False, f"Sample {sample_id}: missing required key 'generation'") + continue + transcript = (row["generation"] or "").strip()As per coding guidelines "Don't use
.get()for accessing dictionary keys if the code expects them to be present; use direct accessdata[key_name]to fail with a clear error instead of silently corrupting data".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/slurm-tests/unified_salm/check_results.py` around lines 88 - 94, Replace use of row.get(...) with direct key access for required fields: change sample_id = row.get("id") to sample_id = row["id"] and transcript = (row.get("generation") or "").strip() to transcript = row["generation"].strip(); keep the existing soft_assert checks (soft_assert(sample_id in references, ...), soft_assert(bool(transcript), ...)) and the early continue when sample_id not in references so missing/invalid keys raise a clear KeyError rather than silently falling back.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/slurm-tests/run_all.sh`:
- Line 24: The shell invocation uses unquoted variables CLUSTER and RUN_NAME
which can break on whitespace; update the command that calls python
tests/slurm-tests/unified_salm/run_test.py to quote all variable expansions —
replace occurrences of $CLUSTER and $RUN_NAME (both in --cluster, the
--workspace path, and --expname_prefix) with "$CLUSTER" and "$RUN_NAME" so the
values are treated as single arguments even if they contain spaces or special
characters.
In `@tests/slurm-tests/unified_salm/check_results.py`:
- Around line 15-19: The current count-equality check can miss missing or
duplicated sample IDs; update the block that builds and compares output IDs and
reference IDs so it (1) computes sets for both (e.g., output_ids_set and
reference_ids_set) and asserts set equality instead of just equal counts, and
(2) detects duplicates by comparing len(output_ids_list) vs len(output_ids_set)
(and similarly for reference list) and fail with a clear error when duplicates
are found; modify the comparison logic used around the output/reference ID check
to first fail on duplicates and then verify exact set membership (missing/extra
IDs) rather than only comparing totals.
---
Nitpick comments:
In `@tests/slurm-tests/unified_salm/check_results.py`:
- Around line 88-94: Replace use of row.get(...) with direct key access for
required fields: change sample_id = row.get("id") to sample_id = row["id"] and
transcript = (row.get("generation") or "").strip() to transcript =
row["generation"].strip(); keep the existing soft_assert checks
(soft_assert(sample_id in references, ...), soft_assert(bool(transcript), ...))
and the early continue when sample_id not in references so missing/invalid keys
raise a clear KeyError rather than silently falling back.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ef160864-9ad8-4a15-9e6f-7d4e52076298
📒 Files selected for processing (4)
tests/slurm-tests/run_all.shtests/slurm-tests/unified_salm/check_results.pytests/slurm-tests/unified_salm/run_test.pytests/slurm-tests/unified_salm/salm_openai.test
✅ Files skipped from review due to trivial changes (1)
- tests/slurm-tests/unified_salm/salm_openai.test
| # sleep 10 | ||
| python tests/slurm-tests/unified_tts/run_test.py --cluster $CLUSTER --workspace /workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_tts --expname_prefix unified_tts_$RUN_NAME | ||
| # sleep 10 | ||
| python tests/slurm-tests/unified_salm/run_test.py --cluster $CLUSTER --workspace /workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_salm --expname_prefix unified_salm_$RUN_NAME |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if file exists and examine the relevant section
cd /repo && wc -l tests/slurm-tests/run_all.sh && head -30 tests/slurm-tests/run_all.shRepository: NVIDIA-NeMo/Skills
Length of output: 117
🏁 Script executed:
# Find the file and check its contents
find . -name "run_all.sh" -path "*/slurm-tests/*" 2>/dev/nullRepository: NVIDIA-NeMo/Skills
Length of output: 92
🏁 Script executed:
# Examine the file, particularly around line 24
wc -l ./tests/slurm-tests/run_all.sh && echo "---" && cat -n ./tests/slurm-tests/run_all.sh | head -40Repository: NVIDIA-NeMo/Skills
Length of output: 2308
Quote shell variables in the new SALM invocation.
Line 24 uses unquoted $CLUSTER and $RUN_NAME; this can break on whitespace or special characters since these values come from external inputs (script arguments).
Proposed fix
-python tests/slurm-tests/unified_salm/run_test.py --cluster $CLUSTER --workspace /workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_salm --expname_prefix unified_salm_$RUN_NAME
+python tests/slurm-tests/unified_salm/run_test.py --cluster "$CLUSTER" --workspace "/workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_salm" --expname_prefix "unified_salm_$RUN_NAME"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| python tests/slurm-tests/unified_salm/run_test.py --cluster $CLUSTER --workspace /workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_salm --expname_prefix unified_salm_$RUN_NAME | |
| python tests/slurm-tests/unified_salm/run_test.py --cluster "$CLUSTER" --workspace "/workspace/nemo-skills-slurm-ci/$RUN_NAME/unified_salm" --expname_prefix "unified_salm_$RUN_NAME" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 24-24: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 24-24: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 24-24: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/slurm-tests/run_all.sh` at line 24, The shell invocation uses unquoted
variables CLUSTER and RUN_NAME which can break on whitespace; update the command
that calls python tests/slurm-tests/unified_salm/run_test.py to quote all
variable expansions — replace occurrences of $CLUSTER and $RUN_NAME (both in
--cluster, the --workspace path, and --expname_prefix) with "$CLUSTER" and
"$RUN_NAME" so the values are treated as single arguments even if they contain
spaces or special characters.
| import argparse | ||
| import json | ||
| import re | ||
| import sys | ||
| from pathlib import Path |
There was a problem hiding this comment.
Count equality alone can miss missing sample coverage.
Line 85 + Lines 87-101 can still pass when one reference ID is missing but another ID is duplicated. Add explicit set/duplicate checks for output IDs.
Proposed fix
import argparse
+from collections import Counter
import json
import re
import sys
from pathlib import Path
@@
- soft_assert(len(rows) == len(references), f"Expected {len(references)} outputs, found {len(rows)}")
+ soft_assert(len(rows) == len(references), f"Expected {len(references)} outputs, found {len(rows)}")
+ output_ids = [row["id"] for row in rows if "id" in row]
+ missing_ids = sorted(set(references) - set(output_ids))
+ extra_ids = sorted(set(output_ids) - set(references))
+ duplicate_ids = sorted([sid for sid, cnt in Counter(output_ids).items() if cnt > 1])
+ soft_assert(not missing_ids, f"Missing sample ids in output: {', '.join(missing_ids)}")
+ soft_assert(not extra_ids, f"Unexpected sample ids in output: {', '.join(extra_ids)}")
+ soft_assert(not duplicate_ids, f"Duplicate sample ids in output: {', '.join(duplicate_ids)}")Also applies to: 85-101
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/slurm-tests/unified_salm/check_results.py` around lines 15 - 19, The
current count-equality check can miss missing or duplicated sample IDs; update
the block that builds and compares output IDs and reference IDs so it (1)
computes sets for both (e.g., output_ids_set and reference_ids_set) and asserts
set equality instead of just equal counts, and (2) detects duplicates by
comparing len(output_ids_list) vs len(output_ids_set) (and similarly for
reference list) and fail with a clear error when duplicates are found; modify
the comparison logic used around the output/reference ID check to first fail on
duplicates and then verify exact set membership (missing/extra IDs) rather than
only comparing totals.
Summary
canary-qwen-2.5b) that usenemo.collections.speechlm2.models.SALMand its chat-style generate() API. This complements the existingnemo_asrbackend (for traditional ASR models) and the upcomingvllm_nemo_speechlmbackend (for vLLM-served models).nemo_asr_backendwhere empty transcriptions caused the raw Hypothesis object repr (~698 words of tensor data) to be returned instead of an empty string, inflating WER scores.run_hf_leaderboard.pyrecipe for running HF Open ASR Leaderboard evaluation on NeMo models with a single command.Summary by CodeRabbit
New Features
Improvements
Bug Fixes