feat: artifact-root resolution for external plugins#1264
feat: artifact-root resolution for external plugins#1264gwarmstrong wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
Conversation
Adds NEMO_GYM_EXTRA_ROOTS, NEMO_GYM_ROOT, and NEMO_GYM_ALLOW_ROOT_OVERRIDE
environment variables so resources servers, agents, model servers, configs,
env.yaml, prompts, rollout inputs, and benchmarks can be hosted in any
plugin repo with the conventional `<root>/{resources_servers,benchmarks,...}`
layout.
A single resolve_artifact() helper unifies the five fallback sites
(cli.py, global_config.load_extra_config_paths, env.yaml, rollout_collection,
prompt.py) and is also used by the new cross-root benchmark discovery in
benchmarks.list_benchmarks. By default, an artifact present in more than
one root raises ArtifactCollisionError; set NEMO_GYM_ALLOW_ROOT_OVERRIDE=1
to fall back to earliest-wins precedence (cwd > extras > NEMO_GYM_ROOT_DIR).
sys.path is augmented so plugin prepare.py modules are importable, with the
same priority order so override mode resolves consistently.
When no env vars are set the resolver collapses to [cwd, PARENT_DIR] and the
realpath-dedup means cwd-equals-Gym (the in-repo case) sees one root,
preserving identical pre-Phase-1 behavior.
Documents the workflow in the Configuration Reference under "Developing
External Artifacts" and adds 56 unit tests covering strict/override modes,
realpath dedup, symlinked roots, validator-filtered collisions, sys.path
ordering, and bool-env-var parsing. 262/262 unit tests pass.
Signed-off-by: gwarmstrong <gwarmstrong@users.noreply.github.com>
|
|
||
| ## Developing External Artifacts | ||
|
|
||
| Three environment variables control where Gym searches for resources servers, model servers, agents, configs, `env.yaml`, prompts, rollout inputs, and benchmarks. Together they let you develop plugins in a separate repository without forking Gym or merging every iteration upstream. |
There was a problem hiding this comment.
Maybe worth clarifying default behavior, which IIRC is:
local install searches Gym/...
pip install checks cwd first, then fallsback to site-packages install location for builtin environment components
There was a problem hiding this comment.
So there is a slight change here--now the resolution order is always cwd -> extras -> install, independent of a local install or pip install. Which I think is fairly ergonomic and is more consistent with how system references (e.g., putting things on PYTHONPATH) is resolved? I think it could be confusing to deviate from that standard. But if you anticipate major backwards incompatibility implications of this change, we can obviously revert it.
In either case, I'm happy to make the docs more clear once we've decided which route to take.
|
I think that environment abstraction like benchmarks/ but for training too will help this too. |
|
we should test pypi wheel on this changes if its not in the latest ci |
feat: artifact-root resolution for external plugins
Summary
Adds three environment variables that let resources servers, agents, model
servers, configs,
env.yaml, prompts, rollout inputs, and benchmarks live inany repo with the conventional
<root>/{resources_servers,benchmarks,...}layout — without forking Gym or merging every iteration upstream.
NEMO_GYM_EXTRA_ROOTScwdand the install location.NEMO_GYM_ROOTNEMO_GYM_ALLOW_ROOT_OVERRIDEWhy
Today every benchmark, resources server, agent, and config has to live in
Gym/{resources_servers,benchmarks,...}/<name>/. Five call sites(
cli.py,global_config.load_extra_config_paths,env.yaml,rollout_collection.py,prompt.py) implemented partial cwd-fallback inslightly different ways, and benchmark discovery had no fallback at all —
so a plugin benchmark in another repo was invisible to
ng_list_benchmarksand
ng_prepare_benchmark.This PR replaces those five fallback sites with a single
resolve_artifact()helper, threads the same search list through benchmark discovery, and
exposes the search list via env vars so plugin repos can opt in without
patching Gym.
How it resolves
For every artifact reference Gym searches in this order:
NEMO_GYM_EXTRA_ROOTS, in listed orderNEMO_GYM_ROOT(defaults to the install location)Each root uses the same internal layout:
Each artifact is resolved independently against the same list, so plugins
from different repos compose:
Strict by default; opt-in override
When the same artifact (or the same benchmark name) is defined in more than
one root, Gym raises
ArtifactCollisionErrorand names every conflictingpath:
This catches accidental shadowing (for example, a stale local copy of a
server renamed upstream). When you want a plugin to take precedence
(testing a local fork in place of a shipped server), set
NEMO_GYM_ALLOW_ROOT_OVERRIDE=1and earliest-wins precedence kicks in.Roots that resolve to the same canonical filesystem path (for example,
running
ng_*from inside the Gym tree wherecwd == NEMO_GYM_ROOT, or asymlinked extra root pointing at Gym) are deduped before the collision
check, so they never false-positive.
Backward compatibility
With no env vars set,
get_artifact_roots()returns[cwd, PARENT_DIR]and the realpath dedup means cwd-equals-Gym (the in-repo case) sees a
single root. Behavior is byte-identical to the pre-Phase-1 cwd-then-PARENT_DIR
fallback that already existed at four of the five call sites. The collision
check is a new protection that only fires when extra roots are
configured, so existing setups are unaffected.