nh-nixos: batch nix path-info calls in nh os info#637
Conversation
Previously `get_closure_size` spawned a separate `nix path-info` process per generation. On systems with many generations this caused hundreds of sequential nix invocations. Which sucked. Fixes #636 Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: Ia9f00bc1f53835b026b0c145674761ca6a6a6964
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR introduces JSON helpers and a batched ChangesClosure Size Batching
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
crates/nh-nixos/src/generations.rs (2)
107-116: 💤 Low valueUse
Map::getfor direct lookup in the object branch.The object branch iterates all keys to find a match;
serde_json::Mapsupports direct key lookup, which is both simpler and avoids an O(N) scan when many paths share the same JSON document.♻️ Proposed simplification
- json.as_array().map_or_else( - || { - json.as_object().and_then(|obj| { - obj - .iter() - .find(|(path, _)| path.as_str() == store_path_str) - .and_then(|(_, value)| value.get("closureSize")) - .and_then(serde_json::Value::as_u64) - }) - }, + json.as_array().map_or_else( + || { + json.as_object().and_then(|obj| { + obj + .get(store_path_str) + .and_then(|value| value.get("closureSize")) + .and_then(serde_json::Value::as_u64) + }) + },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/nh-nixos/src/generations.rs` around lines 107 - 116, The object branch inside json.as_array().map_or_else currently scans obj.iter().find to match store_path_str; replace that O(N) scan with a direct lookup using the Map::get API (e.g., obj.get(store_path_str)) and then chain .and_then(|value| value.get("closureSize")).and_then(serde_json::Value::as_u64) to extract closureSize—update the code in the object closure (the one using json.as_object().and_then(|obj| { ... })) to use obj.get(...) instead of iter().find(...) for a simpler and faster lookup.
132-188: 💤 Low valueBatching design looks solid; correct fallback semantics.
The single
nix path-infoinvocation with the JSON-keyed lookup is exactly the right shape for the issue described in#636. On batch failure (process or parse error) the empty map causesdescribeto fall back to per-generationget_closure_size, which preserves correctness at the cost of speed in the error path — acceptable.One small optional perf note: each call to
closure_size_from_jsonrescans the whole JSON, so the overall cost is O(N²) in the number of generations. At typical generation counts this is negligible, but if you wanted to make it O(N) you could pre-build aHashMap<&str, u64>from the JSON once and look up by store-path string per generation. Not blocking.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/nh-nixos/src/generations.rs` around lines 132 - 188, The current get_closure_sizes_batch calls closure_size_from_json for each generation which rescans the full serde_json::Value each time (O(N²)); to make it O(N) parse the JSON once into a lookup map (e.g. HashMap<String, u64> or HashMap<&str, u64>) keyed by the store_path string, then in the generation_dirs.iter().zip(...) loop do a direct lookup in that map and convert found bytes to a string with bytes_to_gb_string or use "Unknown" when missing; keep closure_size_from_json for single-shot use but replace per-iteration JSON scanning in get_closure_sizes_batch with the prebuilt map lookup.crates/nh-nixos/src/nixos.rs (1)
1227-1233: ⚡ Quick win
list_generationsstill triggers per-generationnix path-infocalls.Passing
Nonehere causesdescribeto fall back toget_closure_size, which spawns onenix path-infoper generation — the exact pattern this PR optimizes away inOsGenerationsArgs::info.list_generationsis used byOsRollbackArgs::rollback, which never consumesclosure_sizefrom the returnedGenerationInfo, so the resulting work is also wasted on the rollback path.Two reasonable options:
- Batch here too (mirrors
info()): collect the candidate paths first, callgenerations::get_closure_sizes_batch, then passSome(size)intodescribe.- Skip computation entirely for callers that don't need sizes: pass a placeholder like
Some(String::new())orSome("Unknown".into())and avoid the nix invocation altogether. Longer-term, thedescribeAPI could be split (e.g. a separatedescribe_without_sizeor a "skip" sentinel) to make this distinction explicit.The first approach keeps behavior identical for any future consumer of
closure_size; the second avoids unnecessary process spawns on the rollback path right now.♻️ Sketch: batch in
list_generationslet mut generations = Vec::new(); - for entry in fs::read_dir(profiles_dir)? { + let entries: Vec<PathBuf> = fs::read_dir(profiles_dir)? + .filter_map(|entry| { + let entry = entry.ok()?; + let path = entry.path(); + let name = path.file_name()?.to_str()?; + (name.starts_with("system-") && name.ends_with("-link")).then_some(path) + }) + .collect(); + + let path_refs: Vec<&Path> = entries.iter().map(PathBuf::as_path).collect(); + let closure_sizes = generations::get_closure_sizes_batch(&path_refs); + + for path in &entries { + let size = closure_sizes.get(path).cloned(); + if let Some(gen_info) = generations::describe(path, size) { + generations.push(gen_info); + } + } - let entry = match entry { - Ok(e) => e, - Err(e) => { - warn!("Failed to read entry in profile directory: {}", e); - continue; - }, - }; - - let path = entry.path(); - if let Some(name) = path.file_name().and_then(|s| s.to_str()) - && name.starts_with("system-") - && name.ends_with("-link") - && let Some(gen_info) = generations::describe(&path, None) - { - generations.push(gen_info); - } - }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/nh-nixos/src/nixos.rs` around lines 1227 - 1233, list_generations currently calls generations::describe(&path, None) which causes per-generation nix path-info calls; instead collect all candidate generation paths in list_generations, call generations::get_closure_sizes_batch with that path list to obtain sizes, and then call generations::describe(path, Some(size)) for each generation so you avoid spawning one process per generation; update list_generations to mirror OsGenerationsArgs::info's batching approach (and keep generations::describe and OsRollbackArgs::rollback behavior unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/nh-nixos/src/generations.rs`:
- Around line 107-116: The object branch inside json.as_array().map_or_else
currently scans obj.iter().find to match store_path_str; replace that O(N) scan
with a direct lookup using the Map::get API (e.g., obj.get(store_path_str)) and
then chain .and_then(|value|
value.get("closureSize")).and_then(serde_json::Value::as_u64) to extract
closureSize—update the code in the object closure (the one using
json.as_object().and_then(|obj| { ... })) to use obj.get(...) instead of
iter().find(...) for a simpler and faster lookup.
- Around line 132-188: The current get_closure_sizes_batch calls
closure_size_from_json for each generation which rescans the full
serde_json::Value each time (O(N²)); to make it O(N) parse the JSON once into a
lookup map (e.g. HashMap<String, u64> or HashMap<&str, u64>) keyed by the
store_path string, then in the generation_dirs.iter().zip(...) loop do a direct
lookup in that map and convert found bytes to a string with bytes_to_gb_string
or use "Unknown" when missing; keep closure_size_from_json for single-shot use
but replace per-iteration JSON scanning in get_closure_sizes_batch with the
prebuilt map lookup.
In `@crates/nh-nixos/src/nixos.rs`:
- Around line 1227-1233: list_generations currently calls
generations::describe(&path, None) which causes per-generation nix path-info
calls; instead collect all candidate generation paths in list_generations, call
generations::get_closure_sizes_batch with that path list to obtain sizes, and
then call generations::describe(path, Some(size)) for each generation so you
avoid spawning one process per generation; update list_generations to mirror
OsGenerationsArgs::info's batching approach (and keep generations::describe and
OsRollbackArgs::rollback behavior unchanged).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 059dc0e0-f6a2-4e52-99ff-f40afced1e26
📒 Files selected for processing (2)
crates/nh-nixos/src/generations.rscrates/nh-nixos/src/nixos.rs
|
New Definitely better than 1.5 minutes! |
Signed-off-by: NotAShelf <raf@notashelf.dev> Change-Id: I064ab02193b9f4c578ddeac68d4363ad6a6a6964
38b72c0 to
f3784d3
Compare
Previously
get_closure_sizespawned a separatenix path-infoprocess per generation. On systems with many generations this caused hundreds of sequential nix invocations. Which sucked.Fixes #636
Change-Id: Ia9f00bc1f53835b026b0c145674761ca6a6a6964
Summary by CodeRabbit
New Features
Refactor