[Feat] Add sandbox_mounts cluster config field with r/w permission support#1350
[Feat] Add sandbox_mounts cluster config field with r/w permission support#1350sgunasekar wants to merge 4 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an optional cluster-config key Changes
Sequence Diagram(s)sequenceDiagram
participant Config as Cluster Config
participant Resolver as Mount Resolver
participant Pipeline as Execution Pipeline
participant RemoteCheck as Remote Mount Checker
participant Executor as Sandbox Executor
Pipeline->>Config: read cluster_config
Pipeline->>Resolver: get_sandbox_mounts_from_config(cluster_config)
alt sandbox_mounts present
Resolver->>Resolver: expand ${VAR}, handle {VAR}, validate src:dst(:mode)
Resolver-->>Pipeline: resolved mounts list
Pipeline->>RemoteCheck: extract mount sources and verify remote dirs
else sandbox_mounts absent
Resolver-->>Pipeline: return None
Pipeline->>Pipeline: consult keep_mounts_for_sandbox flag
end
Pipeline->>Executor: create sandbox executor with mounts (list / None / [])
Executor-->>Pipeline: executor ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
nemo_skills/pipeline/utils/mounts.py (1)
515-518: Use direct access forsandbox_mountsafter the presence check.Once the guard above confirms the key exists,
.get("sandbox_mounts", [])just reintroduces a silent default on a required field.Proposed fix
- mounts = cluster_config.get("sandbox_mounts", []) + mounts = cluster_config["sandbox_mounts"]As per coding guidelines, "Don't use
.get()for accessing dictionary keys if the code expects them to be present; use direct accessdata[key_name]to fail with a clear error instead of silently corrupting data".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/utils/mounts.py` around lines 515 - 518, The code checks for the presence of "sandbox_mounts" in cluster_config but then uses cluster_config.get("sandbox_mounts", []), which reintroduces a silent default; replace that .get call with direct access cluster_config["sandbox_mounts"] (assign to mounts) so the code fails loudly if the key is missing and preserves the earlier presence check; update the variable assignment that sets mounts to use cluster_config["sandbox_mounts"] instead of .get.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/pipeline/utils/declarative.py`:
- Around line 249-266: The sandbox-only mounts are being applied to every script
because keep_mounts defaults True and there is no check that the script is a
SandboxScript; update the mount resolution in the block that uses
get_sandbox_mounts_from_config(cluster_config) so sandbox_mounts is only used
when self.script is a SandboxScript (e.g., guard with isinstance(self.script,
SandboxScript) or an equivalent sandbox-identifying attribute) and otherwise
fall back to the original behavior (keep_mounts -> None to inherit cluster
mounts, keep_mounts False -> []), ensuring execution_config["mounts"] is set
from exec_mounts accordingly.
In `@nemo_skills/pipeline/utils/exp.py`:
- Around line 803-812: The preflight mount check currently runs whenever
"sandbox_mounts" exists in cluster_config, causing unrelated SLURM jobs to fail;
modify run_exp (or the code path that handles SLURM jobs) to only call
get_sandbox_mounts_from_config and check_remote_mount_directories when a sandbox
sidecar will actually be launched (e.g., guard the existing block with the
sandbox-enabled condition used elsewhere or a boolean like is_sandbox_requested
derived from the experiment spec), keep the same use of exit_if_failure and
check_remote_mount_directories(mount_sources, cluster_config,
exit_on_failure=...), and ensure you reference the existing functions
sandbox_mounts, get_sandbox_mounts_from_config, and
check_remote_mount_directories so the check is executed only for sandbox runs.
In `@nemo_skills/pipeline/utils/mounts.py`:
- Around line 115-143: The code currently indexes mount_source[0] /
mount_target[0] which causes IndexError for empty pieces (e.g., ":/dst" or
"src:"); update the validation in the mounts parsing logic to explicitly reject
empty mount_source or mount_target before any indexing: after splitting into
mount_source, mount_target, mount_mode (and before the "{...}" handling) check
if mount_source == "" or mount_target == "" and raise a clear ValueError
describing the malformed mount entry; keep the existing mount_mode validation
and the subsequent env-var resolution for mount_source and mount_target
unchanged, but perform the emptiness check first so functions/variables named
mount_source, mount_target, and mount_mode are protected from IndexError.
In `@tests/test_pipeline_utils.py`:
- Around line 516-546: The tests call get_executor without required parameters
so they raise TypeError before reaching mount assertions; update
test_get_executor_empty_mounts_overrides_config and
test_get_executor_explicit_mounts_overrides_config to pass the same required
args used elsewhere (e.g. num_nodes=1, tasks_per_node=1, gpus_per_node=0,
job_name="test", log_dir="/tmp") when calling get_executor so the mounts
behavior is actually exercised.
---
Nitpick comments:
In `@nemo_skills/pipeline/utils/mounts.py`:
- Around line 515-518: The code checks for the presence of "sandbox_mounts" in
cluster_config but then uses cluster_config.get("sandbox_mounts", []), which
reintroduces a silent default; replace that .get call with direct access
cluster_config["sandbox_mounts"] (assign to mounts) so the code fails loudly if
the key is missing and preserves the earlier presence check; update the variable
assignment that sets mounts to use cluster_config["sandbox_mounts"] instead of
.get.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6e7b338f-3e6e-41b0-8fd0-c139d696ec6b
📒 Files selected for processing (8)
cluster_configs/example-local.yamlcluster_configs/example-slurm.yamldocs/basics/cluster-configs.mdnemo_skills/pipeline/utils/__init__.pynemo_skills/pipeline/utils/declarative.pynemo_skills/pipeline/utils/exp.pynemo_skills/pipeline/utils/mounts.pytests/test_pipeline_utils.py
| if "sandbox_mounts" in cluster_config: | ||
| sandbox_mounts = get_sandbox_mounts_from_config(cluster_config) | ||
| mount_sources = [m.split(":")[0] for m in sandbox_mounts] | ||
| LOG.info("Checking sandbox mount paths: %s", mount_sources) | ||
| exit_if_failure = os.environ.get("NEMO_SKILLS_DISABLE_MOUNT_CHECK", "False").lower() not in ( | ||
| "1", | ||
| "true", | ||
| "yes", | ||
| ) | ||
| check_remote_mount_directories(mount_sources, cluster_config, exit_on_failure=exit_if_failure) |
There was a problem hiding this comment.
Don't make sandbox_mounts a global preflight requirement.
run_exp() now validates these paths for every SLURM run that uses a cluster config containing sandbox_mounts, even if the experiment never launches a sandbox sidecar. That can fail unrelated training/inference jobs on sandbox-only paths, so this check needs to be gated on actual sandbox usage rather than config presence alone.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/pipeline/utils/exp.py` around lines 803 - 812, The preflight
mount check currently runs whenever "sandbox_mounts" exists in cluster_config,
causing unrelated SLURM jobs to fail; modify run_exp (or the code path that
handles SLURM jobs) to only call get_sandbox_mounts_from_config and
check_remote_mount_directories when a sandbox sidecar will actually be launched
(e.g., guard the existing block with the sandbox-enabled condition used
elsewhere or a boolean like is_sandbox_requested derived from the experiment
spec), keep the same use of exit_if_failure and
check_remote_mount_directories(mount_sources, cluster_config,
exit_on_failure=...), and ensure you reference the existing functions
sandbox_mounts, get_sandbox_mounts_from_config, and
check_remote_mount_directories so the check is executed only for sandbox runs.
There was a problem hiding this comment.
This is a more defensive check that fails really fast before launch. This also mimics the checks for "mounts" and easier to maintain as code. The alternate of tracing sandbox usage is more complex with marginal benefits.
There was a problem hiding this comment.
@sgunasekar, that's a fair point. The symmetry with the existing mounts preflight check is a valid reason to keep this pattern, and the NEMO_SKILLS_DISABLE_MOUNT_CHECK escape hatch already covers cases where the check needs to be bypassed. I'll defer to your judgment here.
✏️ Learnings added
Learnt from: sgunasekar
Repo: NVIDIA-NeMo/Skills PR: 1350
File: nemo_skills/pipeline/utils/exp.py:803-812
Timestamp: 2026-04-08T09:06:24.771Z
Learning: In `nemo_skills/pipeline/utils/exp.py`, `run_exp` intentionally runs `sandbox_mounts` path validation as a preflight check whenever `sandbox_mounts` is present in `cluster_config`, regardless of whether a sandbox sidecar is actually launched in that experiment. This mirrors the existing `mounts` check pattern and is deliberately chosen for simplicity and early failure. The `NEMO_SKILLS_DISABLE_MOUNT_CHECK` env var provides an opt-out.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: ludwig-n
Repo: NVIDIA-NeMo/Skills PR: 1310
File: nemo_skills/inference/eval/swebench.py:415-417
Timestamp: 2026-03-17T18:33:03.834Z
Learning: In `nemo_skills/inference/eval/swebench.py`, the `cp -r {container_repo_dir} /testbed` command inside `_execute_container_command` is intentionally guarded by `mode == "agent"`. Agent frameworks (SWE-agent, OpenHands, mini-swe-agent) all expect the repo at `/testbed`, so the copy is required for agent mode. For eval mode, custom eval harnesses are expected to find the repo at its original `container_repo_dir` location. Each `_execute_container_command` call runs in a fresh `apptainer exec --writable-tmpfs` container, so nothing persists between agent and eval invocations anyway.
Fixes raised in #1350: - declarative.py: guard sandbox_mounts logic with hasattr(script, 'keep_mounts') so non-sandbox scripts (server, client) always inherit cluster mounts instead of receiving sandbox_mounts — sandbox_mounts was incorrectly applied to all scripts after the precedence change - mounts.py: use direct dict access cluster_config["sandbox_mounts"] after the presence check instead of .get() with a silent default - mounts.py: raise ValueError on empty source or target in a mount string (e.g. ":/dst" or "src:") before indexing, avoiding IndexError - tests: add required positional args to get_executor tests (num_nodes, tasks_per_node, gpus_per_node, job_name, log_dir) Signed-off-by: suriya <sgunasekar@nvidia.com>
54a9f76 to
40e189d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_pipeline_utils.py (1)
399-406:⚠️ Potential issue | 🟡 MinorExercise the non-sandbox guard with
sandbox_mountspresent.This test never hits the regression fixed in the PR because
cluster_configomitssandbox_mounts. Add asandbox_mountsentry here and keep assertingmounts is None, otherwiseCommand.prepare_for_execution()could start forwarding sandbox-only mounts to regular scripts without this test failing.Suggested test adjustment
- _, exec_config = cmd.prepare_for_execution({"executor": "slurm"}) + _, exec_config = cmd.prepare_for_execution( + { + "executor": "slurm", + "sandbox_mounts": ["/host/data:/sandbox/data:ro"], + } + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pipeline_utils.py` around lines 399 - 406, The test test_non_sandbox_command_mounts_unchanged doesn't include a sandbox_mounts entry so it never exercises the non-sandbox guard; update the test to pass a cluster_config that includes a sandbox_mounts value (e.g., {"sandbox_mounts": [...]}) into Command.prepare_for_execution so the code path that ensures non-SandboxScript commands keep mounts=None is exercised, and continue asserting exec_config["mounts"] is None; locate the test and the call to Command.prepare_for_execution and add the sandbox_mounts entry to the provided cluster_config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_pipeline_utils.py`:
- Around line 579-673: Add a test that asserts configured sandbox_mounts
override keep_mounts_for_sandbox=True for the legacy add_task path: call
add_task (function add_task) with cluster_config containing "sandbox_mounts":
["/host/data:/sandbox/data:ro"], with_sandbox=True and
keep_mounts_for_sandbox=True, patch get_executor and get_free_port as in the
other tests, then inspect mock_get_executor.call_args_list[-1].kwargs["mounts"]
and assert it equals the provided sandbox_mounts (not None); place this
alongside the existing three tests so the legacy path is covered.
---
Outside diff comments:
In `@tests/test_pipeline_utils.py`:
- Around line 399-406: The test test_non_sandbox_command_mounts_unchanged
doesn't include a sandbox_mounts entry so it never exercises the non-sandbox
guard; update the test to pass a cluster_config that includes a sandbox_mounts
value (e.g., {"sandbox_mounts": [...]}) into Command.prepare_for_execution so
the code path that ensures non-SandboxScript commands keep mounts=None is
exercised, and continue asserting exec_config["mounts"] is None; locate the test
and the call to Command.prepare_for_execution and add the sandbox_mounts entry
to the provided cluster_config.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6f4f4a99-3fa1-4df6-82c1-7d698a4744b4
📒 Files selected for processing (3)
nemo_skills/pipeline/utils/declarative.pynemo_skills/pipeline/utils/mounts.pytests/test_pipeline_utils.py
🚧 Files skipped from review as they are similar to previous changes (2)
- nemo_skills/pipeline/utils/declarative.py
- nemo_skills/pipeline/utils/mounts.py
…x=True Raised in review of PR #1350: the add_task path had no test for the highest-priority rule — sandbox_mounts must win even when keep_mounts_for_sandbox=True. Without it, that regression could slip through while the declarative path stays green. Signed-off-by: suriya <sgunasekar@nvidia.com>
Fixes raised in #1350: - declarative.py: guard sandbox_mounts logic with hasattr(script, 'keep_mounts') so non-sandbox scripts (server, client) always inherit cluster mounts instead of receiving sandbox_mounts — sandbox_mounts was incorrectly applied to all scripts after the precedence change - mounts.py: use direct dict access cluster_config["sandbox_mounts"] after the presence check instead of .get() with a silent default - mounts.py: raise ValueError on empty source or target in a mount string (e.g. ":/dst" or "src:") before indexing, avoiding IndexError - tests: add required positional args to get_executor tests (num_nodes, tasks_per_node, gpus_per_node, job_name, log_dir) Signed-off-by: suriya <sgunasekar@nvidia.com>
…x=True Raised in review of PR #1350: the add_task path had no test for the highest-priority rule — sandbox_mounts must win even when keep_mounts_for_sandbox=True. Without it, that regression could slip through while the declarative path stays green. Signed-off-by: suriya <sgunasekar@nvidia.com>
bf00b0f to
db33358
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/test_pipeline_utils.py (1)
409-416:⚠️ Potential issue | 🟡 MinorAdd the non-sandbox regression case to this matrix.
These new cases only cover
SandboxScript. The earlier failure mode here was a normalCommandpicking upsandbox_mountswhen that key existed, andtest_non_sandbox_command_mounts_unchanged()still doesn't exercise that configuration.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pipeline_utils.py` around lines 409 - 416, Add a regression test case covering the non-sandbox (Command) path where sandbox_mounts exists in the config but should not affect Command mounts: update or extend test_non_sandbox_command_mounts_unchanged to include a scenario with sandbox_mounts present (both keep_mounts_for_sandbox true/false as needed) and assert that the Command object's mounts remain unchanged; also ensure the matrix comment above the SandboxScript tests documents this non-sandbox case so future readers see that Commands must ignore sandbox_mounts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemo_skills/pipeline/utils/mounts.py`:
- Around line 96-104: The current check after os.path.expandvars in mounts.py
only detects unresolved "${VAR}" patterns but misses bare "$VAR" placeholders;
update the conditional in the block that processes the local variable mount (the
code using os.path.expandvars(mount)) to use the same detection logic as
_resolve_path_placeholders — i.e., treat any leftover "$" or unbalanced "{" or
"}" as an unresolved placeholder — so replace the existing if condition with one
that checks if "$" in mount or "{" in mount or "}" in mount and then raise the
same ValueError when true.
---
Duplicate comments:
In `@tests/test_pipeline_utils.py`:
- Around line 409-416: Add a regression test case covering the non-sandbox
(Command) path where sandbox_mounts exists in the config but should not affect
Command mounts: update or extend test_non_sandbox_command_mounts_unchanged to
include a scenario with sandbox_mounts present (both keep_mounts_for_sandbox
true/false as needed) and assert that the Command object's mounts remain
unchanged; also ensure the matrix comment above the SandboxScript tests
documents this non-sandbox case so future readers see that Commands must ignore
sandbox_mounts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e367244c-47e9-4f30-a9db-ec3429697fef
📒 Files selected for processing (8)
cluster_configs/example-local.yamlcluster_configs/example-slurm.yamldocs/basics/cluster-configs.mdnemo_skills/pipeline/utils/__init__.pynemo_skills/pipeline/utils/declarative.pynemo_skills/pipeline/utils/exp.pynemo_skills/pipeline/utils/mounts.pytests/test_pipeline_utils.py
✅ Files skipped from review due to trivial changes (4)
- cluster_configs/example-slurm.yaml
- cluster_configs/example-local.yaml
- nemo_skills/pipeline/utils/exp.py
- docs/basics/cluster-configs.md
🚧 Files skipped from review as they are similar to previous changes (1)
- nemo_skills/pipeline/utils/init.py
| # First, expand any ${VAR} style environment variables in the entire mount string | ||
| # This allows partial path substitution like /path/${USER}/subdir | ||
| if "$" in mount: | ||
| mount = os.path.expandvars(mount) | ||
| if "${" in mount and "}" in mount: | ||
| raise ValueError( | ||
| f"Mount `{mount}` contains env variable placeholders, but the required env var is " | ||
| f"not provided in the environment to resolve." | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import os
for raw in ("${UNSET_SKILLS_VAR}:/dst", "$UNSET_SKILLS_VAR:/dst"):
print(f"{raw} -> {os.path.expandvars(raw)}")
PY
sed -n '96,104p' nemo_skills/pipeline/utils/mounts.pyRepository: NVIDIA-NeMo/Skills
Length of output: 632
🏁 Script executed:
wc -l nemo_skills/pipeline/utils/mounts.pyRepository: NVIDIA-NeMo/Skills
Length of output: 102
🏁 Script executed:
cat -n nemo_skills/pipeline/utils/mounts.py | head -150Repository: NVIDIA-NeMo/Skills
Length of output: 7242
🏁 Script executed:
# Search for tests related to mount validation
fd -t f "*test*mount*" -o -name "*mount*test*"Repository: NVIDIA-NeMo/Skills
Length of output: 292
🏁 Script executed:
# Check for any comments or design notes about $VAR vs ${VAR}
rg -n "VAR|expandvars" nemo_skills/pipeline/utils/mounts.py -A 2 -B 2Repository: NVIDIA-NeMo/Skills
Length of output: 1441
Fix unresolved $VAR placeholder check to match _resolve_path_placeholders behavior.
The validation at line 100 only rejects ${VAR} form but misses bare $VAR style. Since os.path.expandvars() processes both forms and leaves both unchanged when the env var is unset, a mount like $UNSET_SKILLS_VAR:/dst passes validation—inconsistent with _resolve_path_placeholders (line 63) which checks for any leftover $. This allows typos to become runtime errors instead of config-time errors.
Change the check from:
if "${" in mount and "}" in mount:to match the pattern in _resolve_path_placeholders:
if "$" in mount or "{" in mount or "}" in mount:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nemo_skills/pipeline/utils/mounts.py` around lines 96 - 104, The current
check after os.path.expandvars in mounts.py only detects unresolved "${VAR}"
patterns but misses bare "$VAR" placeholders; update the conditional in the
block that processes the local variable mount (the code using
os.path.expandvars(mount)) to use the same detection logic as
_resolve_path_placeholders — i.e., treat any leftover "$" or unbalanced "{" or
"}" as an unresolved placeholder — so replace the existing if condition with one
that checks if "$" in mount or "{" in mount or "}" in mount and then raise the
same ValueError when true.
Introduces a new optional `sandbox_mounts` key in cluster configs that lets users specify exactly which paths are mounted into the sandbox container, with optional read/write mode (`src:dst` or `src:dst:ro` / `src:dst:rw`). Mount resolution priority (highest to lowest): 1. `sandbox_mounts` in config (non-empty) — used as-is, overrides keep_mounts_for_sandbox 2. `keep_mounts_for_sandbox=True` with no sandbox_mounts — inherit all cluster mounts 3. `keep_mounts_for_sandbox=False` with no sandbox_mounts — no mounts (safe default) This makes the risky full-cluster-mount inheritance opt-out when sandbox_mounts is explicitly configured, while remaining fully backward compatible: existing configs without sandbox_mounts behave identically to before. Both the declarative pipeline (Command.prepare_for_execution) and the legacy add_task path in exp.py are updated. Sandbox mount paths are validated at launch time via run_exp, consistent with how regular mounts are checked. Signed-off-by: suriya <sgunasekar@nvidia.com>
Fixes raised in #1350: - declarative.py: guard sandbox_mounts logic with hasattr(script, 'keep_mounts') so non-sandbox scripts (server, client) always inherit cluster mounts instead of receiving sandbox_mounts — sandbox_mounts was incorrectly applied to all scripts after the precedence change - mounts.py: use direct dict access cluster_config["sandbox_mounts"] after the presence check instead of .get() with a silent default - mounts.py: raise ValueError on empty source or target in a mount string (e.g. ":/dst" or "src:") before indexing, avoiding IndexError - tests: add required positional args to get_executor tests (num_nodes, tasks_per_node, gpus_per_node, job_name, log_dir) Signed-off-by: suriya <sgunasekar@nvidia.com>
…x=True Raised in review of PR #1350: the add_task path had no test for the highest-priority rule — sandbox_mounts must win even when keep_mounts_for_sandbox=True. Without it, that regression could slip through while the declarative path stays green. Signed-off-by: suriya <sgunasekar@nvidia.com>
db33358 to
d003c09
Compare
gwarmstrong
left a comment
There was a problem hiding this comment.
@sgunasekar small change requested to the Command construction path.
Also, can we add a functional test to the read only mounts? right now it looks like the tests are only looking for configurational differences.
| # Non-sandbox script: always inherit cluster mounts | ||
| exec_mounts = None | ||
| else: | ||
| sandbox_mounts = get_sandbox_mounts_from_config(cluster_config) |
There was a problem hiding this comment.
Can we do this based on a script attribute instead? I think it would be preferable to reduce the dependency on cluster_config here. The sandbox mounts can be parsed/added at the Command instantiation rather than inside this method. Wdyt?
Introduces a new optional
sandbox_mountskey in cluster configs that lets users specify exactly which paths are mounted into the sandbox container, with optional read/write mode (src:dstorsrc:dst:ro/src:dst:rw).Mount resolution priority (highest to lowest):
sandbox_mountsin config (non-empty) — used as-is, overrides--keep_mounts_for_sandboxkeep_mounts_for_sandbox=Truewith nosandbox_mounts— inherit all cluster mountskeep_mounts_for_sandbox=Falsewith nosandbox_mounts— no mounts (safe default)Fully backward compatible: without
sandbox_mountsin cluster congig, the existing configs and cli flags--with_sandboxand--keep_mounts_for_sandboxbehave identically to before.Code design choices
Command.prepare_for_execution) and the legacyadd_taskpath inexp.pyare updated.sandbox_mountsare validated at launch time via run_exp, consistent with how regular mounts are checked.sandbox_mountsoverrides the--keep_mounts_for_sandboxflag as a safer priority.Summary by CodeRabbit
New Features
Behavior Changes
Documentation
Tests