[Feat] Add sandbox_mounts cluster config field with r/w permission support by sgunasekar · Pull Request #1350 · NVIDIA-NeMo/Skills

sgunasekar · 2026-04-08T06:22:14Z

Introduces a new optional sandbox_mounts key in cluster configs that lets users specify exactly which paths are mounted into the sandbox container, with optional read/write mode (src:dst or src:dst:ro / src:dst:rw).

Mount resolution priority (highest to lowest):

sandbox_mounts in config (non-empty) — used as-is, overrides --keep_mounts_for_sandbox
keep_mounts_for_sandbox=True with no sandbox_mounts — inherit all cluster mounts
keep_mounts_for_sandbox=False with no sandbox_mounts — no mounts (safe default)

Fully backward compatible: without sandbox_mounts in cluster congig, the existing configs and cli flags --with_sandbox and --keep_mounts_for_sandbox behave identically to before.

Code design choices

Both the declarative pipeline (Command.prepare_for_execution) and the legacy add_task path in exp.py are updated.
Source paths in sandbox_mounts are validated at launch time via run_exp, consistent with how regular mounts are checked.
Specifying sandbox_mounts overrides the --keep_mounts_for_sandbox flag as a safer priority.
Intentionally kept this simple rather than introducing more fine-grained mount resolution logic - in future we can potentially add more flags/sub-keys to selectively control mounth paths at runtime.

Summary by CodeRabbit

New Features
- Added sandbox_mounts config to explicitly control mounts for sandbox containers (supports src:dst and src:dst:mode with ro/rw).
Behavior Changes
- sandbox_mounts now takes precedence over the previous keep-mounts behavior; when absent, previous fallback remains.
- Mount entries support environment-variable expansion and stricter validation.
Documentation
- Added docs describing syntax, precedence, and guidance.
Tests
- Added tests for parsing, expansion, validation, and precedence.

coderabbitai · 2026-04-08T06:32:51Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an optional cluster-config key sandbox_mounts, a resolver that expands ${VAR} and legacy {VAR} and validates optional :mode, changes sandbox-mount precedence (config > keep_mounts flag), integrates mount checks when configured, exposes the resolver, and adds tests covering precedence and expansion.

Changes

Cohort / File(s)	Summary
Configuration Examples `cluster_configs/example-local.yaml`, `cluster_configs/example-slurm.yaml`	Added commented examples and syntax notes for `sandbox_mounts` (`src:dst` or `src:dst:mode` with `ro`/`rw`) and clarified fallback to `keep_mounts_for_sandbox` when absent.
User Documentation `docs/basics/cluster-configs.md`	Documented `mounts` vs `sandbox_mounts`, precedence rules, accepted mount string formats, `:mode` constraints, and guidance for when to use each option.
Mount Parsing & Helpers `nemo_skills/pipeline/utils/mounts.py`	Added `_resolve_mount_path_with_env_variables()` to expand/validate `${VAR}` and `{VAR}`, enforce `src:dst` or `src:dst:mode` formats and `ro`/`rw` modes; refactored `get_mounts_from_config()` to use the helper; added `get_sandbox_mounts_from_config()` which returns `None` when key absent and allows `:mode`.
Public Exports `nemo_skills/pipeline/utils/__init__.py`	Re-exported `get_sandbox_mounts_from_config` into the `nemo_skills.pipeline.utils` namespace.
Pipeline Integration `nemo_skills/pipeline/utils/declarative.py`, `nemo_skills/pipeline/utils/exp.py`	Replaced direct `keep_mounts` mapping with priority-based resolution using `get_sandbox_mounts_from_config()`; sandbox `mounts` now resolves to configured list, `None` (inherit), or `[]` (none); added pre-run remote-mount directory checks when `sandbox_mounts` provided.
Tests `tests/test_pipeline_utils.py`	Added tests for sandbox mount precedence (config vs `keep_mounts_for_sandbox`), `${VAR}` expansion in `sandbox_mounts`, and executor `mounts` selection semantics (`None` vs `[]` vs explicit list).

Sequence Diagram(s)

sequenceDiagram
    participant Config as Cluster Config
    participant Resolver as Mount Resolver
    participant Pipeline as Execution Pipeline
    participant RemoteCheck as Remote Mount Checker
    participant Executor as Sandbox Executor

    Pipeline->>Config: read cluster_config
    Pipeline->>Resolver: get_sandbox_mounts_from_config(cluster_config)
    alt sandbox_mounts present
        Resolver->>Resolver: expand ${VAR}, handle {VAR}, validate src:dst(:mode)
        Resolver-->>Pipeline: resolved mounts list
        Pipeline->>RemoteCheck: extract mount sources and verify remote dirs
    else sandbox_mounts absent
        Resolver-->>Pipeline: return None
        Pipeline->>Pipeline: consult keep_mounts_for_sandbox flag
    end
    Pipeline->>Executor: create sandbox executor with mounts (list / None / [])
    Executor-->>Pipeline: executor ready

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main feature addition: a new cluster config field `sandbox_mounts` with read/write permission support for sandbox container mounts.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch custom_sandbox_mount_paths

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

nemo_skills/pipeline/utils/mounts.py (1)
515-518: Use direct access for sandbox_mounts after the presence check.

Once the guard above confirms the key exists, .get("sandbox_mounts", []) just reintroduces a silent default on a required field.
Proposed fix
-    mounts = cluster_config.get("sandbox_mounts", [])
+    mounts = cluster_config["sandbox_mounts"]
As per coding guidelines, "Don't use .get() for accessing dictionary keys if the code expects them to be present; use direct access data[key_name] to fail with a clear error instead of silently corrupting data".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/utils/mounts.py` around lines 515 - 518, The code checks
for the presence of "sandbox_mounts" in cluster_config but then uses
cluster_config.get("sandbox_mounts", []), which reintroduces a silent default;
replace that .get call with direct access cluster_config["sandbox_mounts"]
(assign to mounts) so the code fails loudly if the key is missing and preserves
the earlier presence check; update the variable assignment that sets mounts to
use cluster_config["sandbox_mounts"] instead of .get.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/pipeline/utils/declarative.py`:
- Around line 249-266: The sandbox-only mounts are being applied to every script
because keep_mounts defaults True and there is no check that the script is a
SandboxScript; update the mount resolution in the block that uses
get_sandbox_mounts_from_config(cluster_config) so sandbox_mounts is only used
when self.script is a SandboxScript (e.g., guard with isinstance(self.script,
SandboxScript) or an equivalent sandbox-identifying attribute) and otherwise
fall back to the original behavior (keep_mounts -> None to inherit cluster
mounts, keep_mounts False -> []), ensuring execution_config["mounts"] is set
from exec_mounts accordingly.

In `@nemo_skills/pipeline/utils/exp.py`:
- Around line 803-812: The preflight mount check currently runs whenever
"sandbox_mounts" exists in cluster_config, causing unrelated SLURM jobs to fail;
modify run_exp (or the code path that handles SLURM jobs) to only call
get_sandbox_mounts_from_config and check_remote_mount_directories when a sandbox
sidecar will actually be launched (e.g., guard the existing block with the
sandbox-enabled condition used elsewhere or a boolean like is_sandbox_requested
derived from the experiment spec), keep the same use of exit_if_failure and
check_remote_mount_directories(mount_sources, cluster_config,
exit_on_failure=...), and ensure you reference the existing functions
sandbox_mounts, get_sandbox_mounts_from_config, and
check_remote_mount_directories so the check is executed only for sandbox runs.

In `@nemo_skills/pipeline/utils/mounts.py`:
- Around line 115-143: The code currently indexes mount_source[0] /
mount_target[0] which causes IndexError for empty pieces (e.g., ":/dst" or
"src:"); update the validation in the mounts parsing logic to explicitly reject
empty mount_source or mount_target before any indexing: after splitting into
mount_source, mount_target, mount_mode (and before the "{...}" handling) check
if mount_source == "" or mount_target == "" and raise a clear ValueError
describing the malformed mount entry; keep the existing mount_mode validation
and the subsequent env-var resolution for mount_source and mount_target
unchanged, but perform the emptiness check first so functions/variables named
mount_source, mount_target, and mount_mode are protected from IndexError.

In `@tests/test_pipeline_utils.py`:
- Around line 516-546: The tests call get_executor without required parameters
so they raise TypeError before reaching mount assertions; update
test_get_executor_empty_mounts_overrides_config and
test_get_executor_explicit_mounts_overrides_config to pass the same required
args used elsewhere (e.g. num_nodes=1, tasks_per_node=1, gpus_per_node=0,
job_name="test", log_dir="/tmp") when calling get_executor so the mounts
behavior is actually exercised.

---

Nitpick comments:
In `@nemo_skills/pipeline/utils/mounts.py`:
- Around line 515-518: The code checks for the presence of "sandbox_mounts" in
cluster_config but then uses cluster_config.get("sandbox_mounts", []), which
reintroduces a silent default; replace that .get call with direct access
cluster_config["sandbox_mounts"] (assign to mounts) so the code fails loudly if
the key is missing and preserves the earlier presence check; update the variable
assignment that sets mounts to use cluster_config["sandbox_mounts"] instead of
.get.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6e7b338f-3e6e-41b0-8fd0-c139d696ec6b

📥 Commits

Reviewing files that changed from the base of the PR and between 74650dd and 96ff3fc.

📒 Files selected for processing (8)

cluster_configs/example-local.yaml
cluster_configs/example-slurm.yaml
docs/basics/cluster-configs.md
nemo_skills/pipeline/utils/__init__.py
nemo_skills/pipeline/utils/declarative.py
nemo_skills/pipeline/utils/exp.py
nemo_skills/pipeline/utils/mounts.py
tests/test_pipeline_utils.py

coderabbitai · 2026-04-08T06:32:54Z

+    if "sandbox_mounts" in cluster_config:
+        sandbox_mounts = get_sandbox_mounts_from_config(cluster_config)
+        mount_sources = [m.split(":")[0] for m in sandbox_mounts]
+        LOG.info("Checking sandbox mount paths: %s", mount_sources)
+        exit_if_failure = os.environ.get("NEMO_SKILLS_DISABLE_MOUNT_CHECK", "False").lower() not in (
+            "1",
+            "true",
+            "yes",
+        )
+        check_remote_mount_directories(mount_sources, cluster_config, exit_on_failure=exit_if_failure)


⚠️ Potential issue | 🟠 Major

Don't make sandbox_mounts a global preflight requirement.

run_exp() now validates these paths for every SLURM run that uses a cluster config containing sandbox_mounts, even if the experiment never launches a sandbox sidecar. That can fail unrelated training/inference jobs on sandbox-only paths, so this check needs to be gated on actual sandbox usage rather than config presence alone.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/utils/exp.py` around lines 803 - 812, The preflight mount check currently runs whenever "sandbox_mounts" exists in cluster_config, causing unrelated SLURM jobs to fail; modify run_exp (or the code path that handles SLURM jobs) to only call get_sandbox_mounts_from_config and check_remote_mount_directories when a sandbox sidecar will actually be launched (e.g., guard the existing block with the sandbox-enabled condition used elsewhere or a boolean like is_sandbox_requested derived from the experiment spec), keep the same use of exit_if_failure and check_remote_mount_directories(mount_sources, cluster_config, exit_on_failure=...), and ensure you reference the existing functions sandbox_mounts, get_sandbox_mounts_from_config, and check_remote_mount_directories so the check is executed only for sandbox runs.

This is a more defensive check that fails really fast before launch. This also mimics the checks for "mounts" and easier to maintain as code. The alternate of tracing sandbox usage is more complex with marginal benefits.

@sgunasekar, that's a fair point. The symmetry with the existing mounts preflight check is a valid reason to keep this pattern, and the NEMO_SKILLS_DISABLE_MOUNT_CHECK escape hatch already covers cases where the check needs to be bypassed. I'll defer to your judgment here.

✏️ Learnings added

Learnt from: sgunasekar Repo: NVIDIA-NeMo/Skills PR: 1350 File: nemo_skills/pipeline/utils/exp.py:803-812 Timestamp: 2026-04-08T09:06:24.771Z Learning: In `nemo_skills/pipeline/utils/exp.py`, `run_exp` intentionally runs `sandbox_mounts` path validation as a preflight check whenever `sandbox_mounts` is present in `cluster_config`, regardless of whether a sandbox sidecar is actually launched in that experiment. This mirrors the existing `mounts` check pattern and is deliberately chosen for simplicity and early failure. The `NEMO_SKILLS_DISABLE_MOUNT_CHECK` env var provides an opt-out.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: ludwig-n Repo: NVIDIA-NeMo/Skills PR: 1310 File: nemo_skills/inference/eval/swebench.py:415-417 Timestamp: 2026-03-17T18:33:03.834Z Learning: In `nemo_skills/inference/eval/swebench.py`, the `cp -r {container_repo_dir} /testbed` command inside `_execute_container_command` is intentionally guarded by `mode == "agent"`. Agent frameworks (SWE-agent, OpenHands, mini-swe-agent) all expect the repo at `/testbed`, so the copy is required for agent mode. For eval mode, custom eval harnesses are expected to find the repo at its original `container_repo_dir` location. Each `_execute_container_command` call runs in a fresh `apptainer exec --writable-tmpfs` container, so nothing persists between agent and eval invocations anyway.

Fixes raised in #1350: - declarative.py: guard sandbox_mounts logic with hasattr(script, 'keep_mounts') so non-sandbox scripts (server, client) always inherit cluster mounts instead of receiving sandbox_mounts — sandbox_mounts was incorrectly applied to all scripts after the precedence change - mounts.py: use direct dict access cluster_config["sandbox_mounts"] after the presence check instead of .get() with a silent default - mounts.py: raise ValueError on empty source or target in a mount string (e.g. ":/dst" or "src:") before indexing, avoiding IndexError - tests: add required positional args to get_executor tests (num_nodes, tasks_per_node, gpus_per_node, job_name, log_dir) Signed-off-by: suriya <sgunasekar@nvidia.com>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_pipeline_utils.py (1)
399-406: ⚠️ Potential issue | 🟡 Minor

Exercise the non-sandbox guard with sandbox_mounts present.

This test never hits the regression fixed in the PR because cluster_config omits sandbox_mounts. Add a sandbox_mounts entry here and keep asserting mounts is None, otherwise Command.prepare_for_execution() could start forwarding sandbox-only mounts to regular scripts without this test failing.
Suggested test adjustment
-    _, exec_config = cmd.prepare_for_execution({"executor": "slurm"})
+    _, exec_config = cmd.prepare_for_execution(
+        {
+            "executor": "slurm",
+            "sandbox_mounts": ["/host/data:/sandbox/data:ro"],
+        }
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_pipeline_utils.py` around lines 399 - 406, The test
test_non_sandbox_command_mounts_unchanged doesn't include a sandbox_mounts entry
so it never exercises the non-sandbox guard; update the test to pass a
cluster_config that includes a sandbox_mounts value (e.g., {"sandbox_mounts":
[...]}) into Command.prepare_for_execution so the code path that ensures
non-SandboxScript commands keep mounts=None is exercised, and continue asserting
exec_config["mounts"] is None; locate the test and the call to
Command.prepare_for_execution and add the sandbox_mounts entry to the provided
cluster_config.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_pipeline_utils.py`:
- Around line 579-673: Add a test that asserts configured sandbox_mounts
override keep_mounts_for_sandbox=True for the legacy add_task path: call
add_task (function add_task) with cluster_config containing "sandbox_mounts":
["/host/data:/sandbox/data:ro"], with_sandbox=True and
keep_mounts_for_sandbox=True, patch get_executor and get_free_port as in the
other tests, then inspect mock_get_executor.call_args_list[-1].kwargs["mounts"]
and assert it equals the provided sandbox_mounts (not None); place this
alongside the existing three tests so the legacy path is covered.

---

Outside diff comments:
In `@tests/test_pipeline_utils.py`:
- Around line 399-406: The test test_non_sandbox_command_mounts_unchanged
doesn't include a sandbox_mounts entry so it never exercises the non-sandbox
guard; update the test to pass a cluster_config that includes a sandbox_mounts
value (e.g., {"sandbox_mounts": [...]}) into Command.prepare_for_execution so
the code path that ensures non-SandboxScript commands keep mounts=None is
exercised, and continue asserting exec_config["mounts"] is None; locate the test
and the call to Command.prepare_for_execution and add the sandbox_mounts entry
to the provided cluster_config.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f4f4a99-3fa1-4df6-82c1-7d698a4744b4

📥 Commits

Reviewing files that changed from the base of the PR and between 54a9f76 and 40e189d.

📒 Files selected for processing (3)

nemo_skills/pipeline/utils/declarative.py
nemo_skills/pipeline/utils/mounts.py
tests/test_pipeline_utils.py

🚧 Files skipped from review as they are similar to previous changes (2)

nemo_skills/pipeline/utils/declarative.py
nemo_skills/pipeline/utils/mounts.py

…x=True Raised in review of PR #1350: the add_task path had no test for the highest-priority rule — sandbox_mounts must win even when keep_mounts_for_sandbox=True. Without it, that regression could slip through while the declarative path stays green. Signed-off-by: suriya <sgunasekar@nvidia.com>

Fixes raised in #1350: - declarative.py: guard sandbox_mounts logic with hasattr(script, 'keep_mounts') so non-sandbox scripts (server, client) always inherit cluster mounts instead of receiving sandbox_mounts — sandbox_mounts was incorrectly applied to all scripts after the precedence change - mounts.py: use direct dict access cluster_config["sandbox_mounts"] after the presence check instead of .get() with a silent default - mounts.py: raise ValueError on empty source or target in a mount string (e.g. ":/dst" or "src:") before indexing, avoiding IndexError - tests: add required positional args to get_executor tests (num_nodes, tasks_per_node, gpus_per_node, job_name, log_dir) Signed-off-by: suriya <sgunasekar@nvidia.com>

…x=True Raised in review of PR #1350: the add_task path had no test for the highest-priority rule — sandbox_mounts must win even when keep_mounts_for_sandbox=True. Without it, that regression could slip through while the declarative path stays green. Signed-off-by: suriya <sgunasekar@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/test_pipeline_utils.py (1)
409-416: ⚠️ Potential issue | 🟡 Minor

Add the non-sandbox regression case to this matrix.

These new cases only cover SandboxScript. The earlier failure mode here was a normal Command picking up sandbox_mounts when that key existed, and test_non_sandbox_command_mounts_unchanged() still doesn't exercise that configuration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_pipeline_utils.py` around lines 409 - 416, Add a regression test
case covering the non-sandbox (Command) path where sandbox_mounts exists in the
config but should not affect Command mounts: update or extend
test_non_sandbox_command_mounts_unchanged to include a scenario with
sandbox_mounts present (both keep_mounts_for_sandbox true/false as needed) and
assert that the Command object's mounts remain unchanged; also ensure the matrix
comment above the SandboxScript tests documents this non-sandbox case so future
readers see that Commands must ignore sandbox_mounts.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/pipeline/utils/mounts.py`:
- Around line 96-104: The current check after os.path.expandvars in mounts.py
only detects unresolved "${VAR}" patterns but misses bare "$VAR" placeholders;
update the conditional in the block that processes the local variable mount (the
code using os.path.expandvars(mount)) to use the same detection logic as
_resolve_path_placeholders — i.e., treat any leftover "$" or unbalanced "{" or
"}" as an unresolved placeholder — so replace the existing if condition with one
that checks if "$" in mount or "{" in mount or "}" in mount and then raise the
same ValueError when true.

---

Duplicate comments:
In `@tests/test_pipeline_utils.py`:
- Around line 409-416: Add a regression test case covering the non-sandbox
(Command) path where sandbox_mounts exists in the config but should not affect
Command mounts: update or extend test_non_sandbox_command_mounts_unchanged to
include a scenario with sandbox_mounts present (both keep_mounts_for_sandbox
true/false as needed) and assert that the Command object's mounts remain
unchanged; also ensure the matrix comment above the SandboxScript tests
documents this non-sandbox case so future readers see that Commands must ignore
sandbox_mounts.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e367244c-47e9-4f30-a9db-ec3429697fef

📥 Commits

Reviewing files that changed from the base of the PR and between bf00b0f and db33358.

📒 Files selected for processing (8)

cluster_configs/example-local.yaml
cluster_configs/example-slurm.yaml
docs/basics/cluster-configs.md
nemo_skills/pipeline/utils/__init__.py
nemo_skills/pipeline/utils/declarative.py
nemo_skills/pipeline/utils/exp.py
nemo_skills/pipeline/utils/mounts.py
tests/test_pipeline_utils.py

✅ Files skipped from review due to trivial changes (4)

cluster_configs/example-slurm.yaml
cluster_configs/example-local.yaml
nemo_skills/pipeline/utils/exp.py
docs/basics/cluster-configs.md

🚧 Files skipped from review as they are similar to previous changes (1)

nemo_skills/pipeline/utils/init.py

coderabbitai · 2026-04-13T16:31:48Z

+    # First, expand any ${VAR} style environment variables in the entire mount string
+    # This allows partial path substitution like /path/${USER}/subdir
+    if "$" in mount:
+        mount = os.path.expandvars(mount)
+        if "${" in mount and "}" in mount:
+            raise ValueError(
+                f"Mount `{mount}` contains env variable placeholders, but the required env var is "
+                f"not provided in the environment to resolve."
+            )


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' import os for raw in ("${UNSET_SKILLS_VAR}:/dst", "$UNSET_SKILLS_VAR:/dst"): print(f"{raw} -> {os.path.expandvars(raw)}") PY sed -n '96,104p' nemo_skills/pipeline/utils/mounts.py

Repository: NVIDIA-NeMo/Skills

Length of output: 632

🏁 Script executed:

wc -l nemo_skills/pipeline/utils/mounts.py

Repository: NVIDIA-NeMo/Skills

Length of output: 102

🏁 Script executed:

cat -n nemo_skills/pipeline/utils/mounts.py | head -150

Repository: NVIDIA-NeMo/Skills

Length of output: 7242

🏁 Script executed:

# Search for tests related to mount validation fd -t f "*test*mount*" -o -name "*mount*test*"

Repository: NVIDIA-NeMo/Skills

Length of output: 292

🏁 Script executed:

# Check for any comments or design notes about $VAR vs ${VAR} rg -n "VAR|expandvars" nemo_skills/pipeline/utils/mounts.py -A 2 -B 2

Repository: NVIDIA-NeMo/Skills

Length of output: 1441

Fix unresolved $VAR placeholder check to match _resolve_path_placeholders behavior.

The validation at line 100 only rejects ${VAR} form but misses bare $VAR style. Since os.path.expandvars() processes both forms and leaves both unchanged when the env var is unset, a mount like $UNSET_SKILLS_VAR:/dst passes validation—inconsistent with _resolve_path_placeholders (line 63) which checks for any leftover $. This allows typos to become runtime errors instead of config-time errors.

Change the check from:

if "${" in mount and "}" in mount:

to match the pattern in _resolve_path_placeholders:

if "$" in mount or "{" in mount or "}" in mount:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/utils/mounts.py` around lines 96 - 104, The current check after os.path.expandvars in mounts.py only detects unresolved "${VAR}" patterns but misses bare "$VAR" placeholders; update the conditional in the block that processes the local variable mount (the code using os.path.expandvars(mount)) to use the same detection logic as _resolve_path_placeholders — i.e., treat any leftover "$" or unbalanced "{" or "}" as an unresolved placeholder — so replace the existing if condition with one that checks if "$" in mount or "{" in mount or "}" in mount and then raise the same ValueError when true.

Introduces a new optional `sandbox_mounts` key in cluster configs that lets users specify exactly which paths are mounted into the sandbox container, with optional read/write mode (`src:dst` or `src:dst:ro` / `src:dst:rw`). Mount resolution priority (highest to lowest): 1. `sandbox_mounts` in config (non-empty) — used as-is, overrides keep_mounts_for_sandbox 2. `keep_mounts_for_sandbox=True` with no sandbox_mounts — inherit all cluster mounts 3. `keep_mounts_for_sandbox=False` with no sandbox_mounts — no mounts (safe default) This makes the risky full-cluster-mount inheritance opt-out when sandbox_mounts is explicitly configured, while remaining fully backward compatible: existing configs without sandbox_mounts behave identically to before. Both the declarative pipeline (Command.prepare_for_execution) and the legacy add_task path in exp.py are updated. Sandbox mount paths are validated at launch time via run_exp, consistent with how regular mounts are checked. Signed-off-by: suriya <sgunasekar@nvidia.com>

Fixes raised in #1350: - declarative.py: guard sandbox_mounts logic with hasattr(script, 'keep_mounts') so non-sandbox scripts (server, client) always inherit cluster mounts instead of receiving sandbox_mounts — sandbox_mounts was incorrectly applied to all scripts after the precedence change - mounts.py: use direct dict access cluster_config["sandbox_mounts"] after the presence check instead of .get() with a silent default - mounts.py: raise ValueError on empty source or target in a mount string (e.g. ":/dst" or "src:") before indexing, avoiding IndexError - tests: add required positional args to get_executor tests (num_nodes, tasks_per_node, gpus_per_node, job_name, log_dir) Signed-off-by: suriya <sgunasekar@nvidia.com>

…x=True Raised in review of PR #1350: the add_task path had no test for the highest-priority rule — sandbox_mounts must win even when keep_mounts_for_sandbox=True. Without it, that regression could slip through while the declarative path stays green. Signed-off-by: suriya <sgunasekar@nvidia.com>

gwarmstrong

@sgunasekar small change requested to the Command construction path.

Also, can we add a functional test to the read only mounts? right now it looks like the tests are only looking for configurational differences.

gwarmstrong · 2026-04-16T17:53:56Z

+            # Non-sandbox script: always inherit cluster mounts
+            exec_mounts = None
+        else:
+            sandbox_mounts = get_sandbox_mounts_from_config(cluster_config)


Can we do this based on a script attribute instead? I think it would be preferable to reduce the dependency on cluster_config here. The sandbox mounts can be parsed/added at the Command instantiation rather than inside this method. Wdyt?

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

sgunasekar force-pushed the custom_sandbox_mount_paths branch from 54a9f76 to 40e189d Compare April 8, 2026 07:23

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread tests/test_pipeline_utils.py

sgunasekar force-pushed the custom_sandbox_mount_paths branch from bf00b0f to db33358 Compare April 13, 2026 16:25

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

sgunasekar added 3 commits April 15, 2026 16:01

sgunasekar force-pushed the custom_sandbox_mount_paths branch from db33358 to d003c09 Compare April 15, 2026 23:01

gwarmstrong requested changes Apr 16, 2026

View reviewed changes

Merge branch 'main' into custom_sandbox_mount_paths

fb4b64a

gwarmstrong added run GPU tests reviewed and removed run GPU tests labels Apr 16, 2026

Conversation

sgunasekar commented Apr 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code design choices

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgunasekar Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gwarmstrong left a comment

Choose a reason for hiding this comment

Uh oh!

gwarmstrong Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sgunasekar commented Apr 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading

coderabbitai Bot Apr 8, 2026 •

edited

Loading