Skip to content

feat: add niftiindex command for indexing NIfTI and NRRD images#438

Open
JoshuaSiraj wants to merge 20 commits intomainfrom
JoshuaSiraj/nifti-index
Open

feat: add niftiindex command for indexing NIfTI and NRRD images#438
JoshuaSiraj wants to merge 20 commits intomainfrom
JoshuaSiraj/nifti-index

Conversation

@JoshuaSiraj
Copy link
Copy Markdown
Collaborator

@JoshuaSiraj JoshuaSiraj commented Feb 25, 2026

  • Introduced a new CLI command niftiindex for crawling directories of NIfTI and NRRD images, allowing users to create an index of scan/mask pairs.
  • Added support for metadata merging and customizable file patterns.
  • Updated the CLI registry to include the new command under "Other tools for working with medical imaging data."
  • Created necessary modules and classes for handling NIfTI file parsing and indexing.

This enhancement improves the functionality for users working with medical imaging data.

Summary by CodeRabbit

  • New Features
    • Added an "Other tools" CLI group and a new niftiindex command to crawl and index NIfTI directories, pairing scans and masks via configurable name patterns and extensions.
    • Supports optional metadata merging with join-column validation and clear CLI errors.
    • Configurable output/dataset naming, parallel processing, progress display, caching to skip redundant work, and user-facing success/error messages.

- Introduced a new CLI command `niftiindex` for crawling directories of NIfTI and NRRD images, allowing users to create an index of scan/mask pairs.
- Added support for metadata merging and customizable file patterns.
- Updated the CLI registry to include the new command under "Other tools for working with medical imaging data."
- Created necessary modules and classes for handling NIfTI file parsing and indexing.

This enhancement improves the functionality for users working with medical imaging data.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new niftiindex CLI command and a NIfTI crawling subpackage: CLI registration, a Click command, a Pydantic Crawler wrapper, and a comprehensive parse_niftis module that discovers, matches, optionally introspects, merges metadata, caches results, and writes an index of NIfTI files.

Changes

Cohort / File(s) Summary
CLI integration
src/imgtools/cli/__main__.py
Imports and registers the new niftiindex command under a new other command group.
CLI command
src/imgtools/cli/niftiindex.py
New Click command niftiindex: accepts patterns, extensions, metadata, output, parallelism, validation, error handling, and invokes Crawler.
NIfTI crawl package init
src/imgtools/nifti/crawl/__init__.py
New package initializer re-exporting Crawler, ParseNiftiDirResult, and parse_nifti_dir.
Crawler wrapper
src/imgtools/nifti/crawl/crawler.py
Adds Crawler (pydantic.BaseModel) to configure/validate crawl settings, call parse_nifti_dir, manage output_dir defaults, and store results.
Directory parser & utilities
src/imgtools/nifti/crawl/parse_niftis.py
Large new module: pattern→regex parsing with normalizers, file discovery by extensions, optional SimpleITK introspection, parallel processing, metadata path normalization/reading/merge, caching, index CSV writing, many helper functions, public types/constants/exceptions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

hackathon

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: adding a new CLI command for indexing NIfTI and NRRD images.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch JoshuaSiraj/nifti-index

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
src/imgtools/nifti/crawl/crawler.py (1)

116-139: Move embedded demo execution out of the library module.

The hardcoded runnable block is useful for local experiments, but keeping it in a production module reduces clarity and can surprise maintainers when this file is run directly.

🧭 Suggested cleanup
-if __name__ == "__main__":
-    # crawler = Crawler(
-    #     ...
-    # )
-    # crawler.crawl()
-
-    crawler = Crawler(
-        nifti_dir=Path("final-formatted"),
-        scan_name_pattern="images/{disease_site}/{split}/images/{patient_id:d}_{SeriesInstanceUID}.nii.gz",
-        mask_name_pattern="images/{disease_site}/{split}/masks/{patient_id:d}_{SeriesInstanceUID}.nii.gz",
-        force=True,
-        metadata_path=[Path("final-formatted/metadata/patients.csv")],
-        metadata_join_col="patient_id",
-        deep=True,
-        n_jobs=-1,
-    )
-    crawler.crawl()
+if __name__ == "__main__":
+    raise SystemExit("Use the `imgtools niftiindex` CLI command to run the crawler.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/crawler.py` around lines 116 - 139, The file
contains an embedded runnable demo in the if __name__ == "__main__": block that
instantiates Crawler and calls crawl() with hardcoded paths; move this demo out
of the library module into a separate CLI/script or examples file (e.g.,
tools/crawl_demo.py or scripts/run_crawler.py). Remove or replace the block in
src/imgtools/nifti/crawl/crawler.py so the module only exports the Crawler class
and related helpers, and ensure any example invocation (creating Crawler with
nifti_dir, scan_name_pattern, mask_name_pattern, metadata_path,
metadata_join_col, deep, n_jobs, force and calling crawler.crawl()) lives in the
new script or documentation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/imgtools/cli/niftiindex.py`:
- Around line 132-135: The generic error path in the except block (the handler
that logs via logger.exception and then raises a ClickException) uses an
unnecessary f-string for the message; update the raise in that block to use a
plain string (e.g., change raise click.ClickException(f"Crawling failed") from e
to raise click.ClickException("Crawling failed") from e) so Ruff F541 is avoided
while preserving the existing exception chaining and logging behavior.
- Around line 8-10: The DEFAULT_WORKERS calculation can produce zero or negative
values (cpu_count - 2) which will be passed as the default for --n-jobs and
break joblib.Parallel; change the logic that computes DEFAULT_WORKERS (currently
using cpu_count and DEFAULT_WORKERS) to clamp to a minimum of 1 (e.g., if
cpu_count is None use 1, else max(1, cpu_count - 2)) so that n_jobs is always a
positive integer; update any usage or documentation that references
DEFAULT_WORKERS or the CLI option n_jobs to reflect this guarded value.

In `@src/imgtools/nifti/crawl/parse_niftis.py`:
- Around line 201-205: Add a PEP 257-style docstring to the _introspect function
describing its purpose, arguments, and return value: state that it accepts fpath
(Path) and file_type (str), explain what extra data is populated in the returned
dict[str, Any] (e.g., metadata extracted during deep crawl), and note any
exceptions or side effects; place the docstring immediately under the def
_introspect(...) signature using triple quotes and keep it concise and accurate
to the implementation in _introspect.
- Around line 372-385: The cached-result return currently only checks for file
existence; update the cache validation in the block that reads crawl_cache_path
and returns ParseNiftiDirResult to verify the stored cache metadata matches the
current call parameters (compare cached "scan_name_pattern",
"mask_name_pattern", "extensions" (normalize to a tuple/set), "metadata_path"
(normalize to Paths or strings), and "metadata_join_col"); if any of these
differ, skip returning the cached result (log an informational message about
incompatibility) so callers don't receive stale results—perform these checks
before constructing and returning the ParseNiftiDirResult.
- Around line 214-216: The code currently treats any single-valued array as an
empty mask; change the condition so only an all-zero mask is considered empty
(e.g., check that the unique value equals 0 or use np.all(arr==0)), and
otherwise construct a Mask object for non-zero constant masks instead of falling
back to MedImage. Update the branch around variables arr, fpath, sitk_img so
that logger.warning is emitted only when the mask is all zeros, and when the
mask contains a single non-zero value create Mask(sitk_img) (not MedImage) to
allow Mask-specific fingerprinting.

---

Nitpick comments:
In `@src/imgtools/nifti/crawl/crawler.py`:
- Around line 116-139: The file contains an embedded runnable demo in the if
__name__ == "__main__": block that instantiates Crawler and calls crawl() with
hardcoded paths; move this demo out of the library module into a separate
CLI/script or examples file (e.g., tools/crawl_demo.py or
scripts/run_crawler.py). Remove or replace the block in
src/imgtools/nifti/crawl/crawler.py so the module only exports the Crawler class
and related helpers, and ensure any example invocation (creating Crawler with
nifti_dir, scan_name_pattern, mask_name_pattern, metadata_path,
metadata_join_col, deep, n_jobs, force and calling crawler.crawl()) lives in the
new script or documentation.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcb5532 and 836a6b1.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (6)
  • src/imgtools/cli/__main__.py
  • src/imgtools/cli/niftiindex.py
  • src/imgtools/nifti/__init__.py
  • src/imgtools/nifti/crawl/__init__.py
  • src/imgtools/nifti/crawl/crawler.py
  • src/imgtools/nifti/crawl/parse_niftis.py

Comment thread src/imgtools/cli/niftiindex.py
Comment thread src/imgtools/cli/niftiindex.py
Comment thread src/imgtools/nifti/crawl/crawler.py Outdated
Comment thread src/imgtools/nifti/crawl/parse_niftis.py
Comment thread src/imgtools/nifti/crawl/parse_niftis.py Outdated
Comment thread src/imgtools/nifti/crawl/parse_niftis.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/imgtools/nifti/crawl/parse_niftis.py (3)

201-205: ⚠️ Potential issue | 🟡 Minor

Add a docstring to _introspect.

_introspect is central to deep-crawl behavior but currently undocumented, which makes maintenance harder.

📝 Proposed fix
 def _introspect(
     fpath: Path,
     file_type: str,
 ) -> dict[str, t.Any]:
+    """Read one image file and return serialized fingerprint metadata."""
     extra: dict[str, t.Any] = {}

As per coding guidelines, src/**/*.py should follow PEP 257 with docstrings that are present and aligned with implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 201 - 205, Add a PEP
257 docstring to the _introspect function that briefly describes its purpose
(introspects a NIfTI file during deep-crawl), documents parameters (fpath: Path
— path to the file; file_type: str — detected file type or category), describes
the return value (dict[str, Any] containing extracted metadata and any "extra"
keys), lists important side-effects or exceptions the function may raise, and
gives a short example or note about expected keys in the returned dict to aid
maintainers; place the docstring immediately after the def _introspect(...) line
so it reflects current behavior of _introspect and the extra variable.

214-216: ⚠️ Potential issue | 🟠 Major

Fix empty-mask detection for constant non-zero masks.

On Line 214, the condition treats any single-valued mask as empty. A mask filled with non-zero values is valid and should still go through mask-specific handling.

✅ Proposed fix
-        if len(np.unique(arr)) == 1:
+        if np.count_nonzero(arr) == 0:
             logger.warning(f"Mask {fpath} is empty.")
             img = MedImage(sitk_img)
         else:
             arr[arr > 0] = 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 214 - 216, The current
check treats any single-valued mask as empty (len(np.unique(arr)) == 1),
dropping valid constant non-zero masks; change the condition to detect truly
empty masks (all zeros) — e.g., replace the test with an explicit check like
np.all(arr == 0) or (np.unique(arr).size == 1 and np.unique(arr)[0] == 0) so
only zero-valued masks trigger logger.warning and the MedImage(sitk_img)
empty-mask branch, while constant non-zero masks continue through mask-specific
handling (refer to variables arr, fpath and the MedImage(sitk_img) handling in
parse_niftis.py).

372-387: ⚠️ Potential issue | 🟠 Major

Validate cache compatibility before returning cached results.

This branch only checks that cache files exist. If inputs change (e.g., patterns, extensions, deep mode, metadata params), stale results can be returned silently.

🛠️ Proposed fix
-    # Use cache if available
-    if not force and index_csv_path.exists() and crawl_cache_path.exists():
-        logger.info(
-            "Loading cached crawl results.", index_csv_path=str(index_csv_path)
-        )
-        index = pd.read_csv(index_csv_path)
-        cache = json.loads(crawl_cache_path.read_text())
-        return ParseNiftiDirResult(
-            index=index,
-            index_csv_path=index_csv_path,
-            crawl_cache_path=crawl_cache_path,
-            unmatched_files=cache.get("unmatched_files", []),
-            extensions=tuple(cache.get("extensions", list(NIFTI_EXTENSIONS))),
-            deep=cache.get("deep", deep),
-            metadata_path=[Path(p) for p in cache.get("metadata_path", [])],
-            metadata_join_col=cache.get("metadata_join_col"),
-        )
-
-    # Discover files
-    resolved_extensions = _normalise_extensions(extensions)
+    resolved_extensions = _normalise_extensions(extensions)
+
+    # Use cache if available
+    if not force and index_csv_path.exists() and crawl_cache_path.exists():
+        cache = json.loads(crawl_cache_path.read_text())
+        cached_metadata_paths = [Path(p) for p in cache.get("metadata_path", [])]
+        cache_matches = (
+            cache.get("nifti_dir") == str(nifti_dir)
+            and cache.get("scan_name_pattern") == scan_name_pattern
+            and cache.get("mask_name_pattern") == mask_name_pattern
+            and tuple(cache.get("extensions", [])) == resolved_extensions
+            and cache.get("deep", deep) == deep
+            and cached_metadata_paths == metadata_paths
+            and cache.get("metadata_join_col") == metadata_join_col
+        )
+        if cache_matches:
+            logger.info(
+                "Loading cached crawl results.",
+                index_csv_path=str(index_csv_path),
+            )
+            index = pd.read_csv(index_csv_path)
+            return ParseNiftiDirResult(
+                index=index,
+                index_csv_path=index_csv_path,
+                crawl_cache_path=crawl_cache_path,
+                unmatched_files=cache.get("unmatched_files", []),
+                extensions=tuple(cache.get("extensions", list(NIFTI_EXTENSIONS))),
+                deep=cache.get("deep", deep),
+                metadata_path=cached_metadata_paths,
+                metadata_join_col=cache.get("metadata_join_col"),
+            )
+        logger.info(
+            "Ignoring incompatible cache and re-crawling.",
+            crawl_cache_path=str(crawl_cache_path),
+        )
+
+    # Discover files
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 372 - 387, The
cached-return branch for ParseNiftiDirResult currently only checks file
existence and may return stale data; modify the block that reads index_csv_path
and crawl_cache_path to validate cache metadata against current call inputs
(e.g., compare cached "patterns", "extensions", "deep",
"metadata_path"/"metadata_join_col" and any other relevant params) before
returning; if any mismatch is detected, skip the cache and proceed with a fresh
crawl. Use the existing crawl_cache_path JSON (variable cache) to pull cached
fields and compare them to the current variables (patterns, tuple(extensions) or
NIFTI_EXTENSIONS, deep, metadata_path list, metadata_join_col) and only
construct and return ParseNiftiDirResult when all checks pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/imgtools/nifti/crawl/crawler.py`:
- Around line 87-90: Normalize and resolve self.output_dir before validating and
using it: replace the current assignment of self.output_dir (the attribute set
to self.output_dir or self.nifti_dir.parent / ".imgtools") with a normalized
value (expanduser and resolve or Path(...).expanduser().resolve()) and then call
validate_output_dir(self.output_dir) so downstream uses (e.g., later writes that
reference self.output_dir) operate on the resolved absolute path; ensure you
update the code paths that reference self.output_dir accordingly and keep
validate_output_dir(self.output_dir) after normalization.

---

Duplicate comments:
In `@src/imgtools/nifti/crawl/parse_niftis.py`:
- Around line 201-205: Add a PEP 257 docstring to the _introspect function that
briefly describes its purpose (introspects a NIfTI file during deep-crawl),
documents parameters (fpath: Path — path to the file; file_type: str — detected
file type or category), describes the return value (dict[str, Any] containing
extracted metadata and any "extra" keys), lists important side-effects or
exceptions the function may raise, and gives a short example or note about
expected keys in the returned dict to aid maintainers; place the docstring
immediately after the def _introspect(...) line so it reflects current behavior
of _introspect and the extra variable.
- Around line 214-216: The current check treats any single-valued mask as empty
(len(np.unique(arr)) == 1), dropping valid constant non-zero masks; change the
condition to detect truly empty masks (all zeros) — e.g., replace the test with
an explicit check like np.all(arr == 0) or (np.unique(arr).size == 1 and
np.unique(arr)[0] == 0) so only zero-valued masks trigger logger.warning and the
MedImage(sitk_img) empty-mask branch, while constant non-zero masks continue
through mask-specific handling (refer to variables arr, fpath and the
MedImage(sitk_img) handling in parse_niftis.py).
- Around line 372-387: The cached-return branch for ParseNiftiDirResult
currently only checks file existence and may return stale data; modify the block
that reads index_csv_path and crawl_cache_path to validate cache metadata
against current call inputs (e.g., compare cached "patterns", "extensions",
"deep", "metadata_path"/"metadata_join_col" and any other relevant params)
before returning; if any mismatch is detected, skip the cache and proceed with a
fresh crawl. Use the existing crawl_cache_path JSON (variable cache) to pull
cached fields and compare them to the current variables (patterns,
tuple(extensions) or NIFTI_EXTENSIONS, deep, metadata_path list,
metadata_join_col) and only construct and return ParseNiftiDirResult when all
checks pass.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 836a6b1 and c975132.

📒 Files selected for processing (2)
  • src/imgtools/nifti/crawl/crawler.py
  • src/imgtools/nifti/crawl/parse_niftis.py

Comment on lines +87 to +90
self.output_dir = (
self.output_dir or self.nifti_dir.parent / ".imgtools"
)
validate_output_dir(self.output_dir)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize output_dir before validating and passing it downstream.

On Line 90, validate_output_dir validates a resolved path internally, but self.output_dir remains unexpanded/unresolved and is later used on Line 107. This can send writes to a different location (notably when ~ is used).

🔧 Proposed fix
     def crawl(self) -> None:
         """Crawl the directory and build the index."""
-        self.output_dir = (
-            self.output_dir or self.nifti_dir.parent / ".imgtools"
-        )
+        self.output_dir = (
+            self.output_dir or self.nifti_dir.parent / ".imgtools"
+        ).expanduser().resolve()
         validate_output_dir(self.output_dir)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self.output_dir = (
self.output_dir or self.nifti_dir.parent / ".imgtools"
)
validate_output_dir(self.output_dir)
self.output_dir = (
self.output_dir or self.nifti_dir.parent / ".imgtools"
).expanduser().resolve()
validate_output_dir(self.output_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/crawler.py` around lines 87 - 90, Normalize and
resolve self.output_dir before validating and using it: replace the current
assignment of self.output_dir (the attribute set to self.output_dir or
self.nifti_dir.parent / ".imgtools") with a normalized value (expanduser and
resolve or Path(...).expanduser().resolve()) and then call
validate_output_dir(self.output_dir) so downstream uses (e.g., later writes that
reference self.output_dir) operate on the resolved absolute path; ensure you
update the code paths that reference self.output_dir accordingly and keep
validate_output_dir(self.output_dir) after normalization.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 67.80488% with 66 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.71%. Comparing base (9e73f82) to head (ccaa96f).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/imgtools/nifti/crawl/parse_niftis.py 64.24% 64 Missing ⚠️
src/imgtools/nifti/crawl/crawler.py 92.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #438       +/-   ##
===========================================
+ Coverage   54.60%   66.71%   +12.10%     
===========================================
  Files          66       68        +2     
  Lines        4318     4522      +204     
===========================================
+ Hits         2358     3017      +659     
+ Misses       1960     1505      -455     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/imgtools/cli/niftiindex.py (1)

135-135: ⚠️ Potential issue | 🟡 Minor

Drop the unnecessary f-string in the generic error path.

Line 135 uses an f-string with no placeholders.

🧹 Suggested fix
-        raise click.ClickException(f"Crawling failed") from e
+        raise click.ClickException("Crawling failed") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/cli/niftiindex.py` at line 135, Remove the unnecessary f-string
in the exception raise: change the raise statement that currently does raise
click.ClickException(f"Crawling failed") from e to use a plain string like raise
click.ClickException("Crawling failed") from e; this touches the raise of
click.ClickException in the crawl/error handling path.
src/imgtools/nifti/crawl/parse_niftis.py (2)

373-388: ⚠️ Potential issue | 🟠 Major

Validate cache parameters before returning cached results.

Line 373 currently reuses cache based only on file existence. If patterns/extensions/deep/metadata settings changed, stale results can be returned silently.

🛠️ Suggested fix
     # Use cache if available
     if not force and index_csv_path.exists() and crawl_cache_path.exists():
-        logger.info(
-            "Loading cached crawl results.", index_csv_path=str(index_csv_path)
-        )
-        index = pd.read_csv(index_csv_path)
         cache = json.loads(crawl_cache_path.read_text())
-        return ParseNiftiDirResult(
-            index=index,
-            index_csv_path=index_csv_path,
-            crawl_cache_path=crawl_cache_path,
-            unmatched_files=cache.get("unmatched_files", []),
-            extensions=tuple(cache.get("extensions", list(NIFTI_EXTENSIONS))),
-            deep=cache.get("deep", deep),
-            metadata_path=[Path(p) for p in cache.get("metadata_path", [])],
-            metadata_join_col=cache.get("metadata_join_col"),
-        )
+        resolved_extensions = _normalise_extensions(extensions)
+        cache_matches = (
+            cache.get("scan_name_pattern") == scan_name_pattern
+            and cache.get("mask_name_pattern") == mask_name_pattern
+            and tuple(cache.get("extensions", [])) == resolved_extensions
+            and cache.get("deep", deep) == deep
+            and cache.get("metadata_join_col") == metadata_join_col
+            and [str(Path(p)) for p in cache.get("metadata_path", [])]
+            == [str(Path(p)) for p in metadata_paths]
+        )
+        if cache_matches:
+            logger.info(
+                "Loading cached crawl results.", index_csv_path=str(index_csv_path)
+            )
+            index = pd.read_csv(index_csv_path)
+            return ParseNiftiDirResult(
+                index=index,
+                index_csv_path=index_csv_path,
+                crawl_cache_path=crawl_cache_path,
+                unmatched_files=cache.get("unmatched_files", []),
+                extensions=tuple(cache.get("extensions", list(NIFTI_EXTENSIONS))),
+                deep=cache.get("deep", deep),
+                metadata_path=[Path(p) for p in cache.get("metadata_path", [])],
+                metadata_join_col=cache.get("metadata_join_col"),
+            )
+        logger.info("Ignoring incompatible crawl cache and re-crawling.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 373 - 388, Load and
inspect the JSON cache from crawl_cache_path and validate its stored parameters
(extensions, deep, metadata_path, metadata_join_col, and any pattern-related
fields) against the current call parameters (e.g., the function's extensions
list/tuple, deep flag, metadata_path/metadata_join_col, and any pattern inputs)
before returning the cached ParseNiftiDirResult; if any of these don't match
(types normalized: compare tuples for extensions, booleans for deep, Path lists
for metadata_path, and strings/None for metadata_join_col), treat the cache as
stale and fall through to re-crawl instead of returning it. Ensure you keep the
existing symbols: index_csv_path, crawl_cache_path, ParseNiftiDirResult,
NIFTI_EXTENSIONS, force, deep, metadata_path, and metadata_join_col so reviewers
can locate and update the validation logic.

201-207: ⚠️ Potential issue | 🟡 Minor

Move _introspect docstring to the first statement.

Line 205 executes before the string on Line 206, so _introspect.__doc__ is not set.

🛠️ Suggested fix
 def _introspect(
     fpath: Path,
     file_type: str,
 ) -> dict[str, t.Any]:
-    extra: dict[str, t.Any] = {}
-    """Read one image and return a serialized fingerprint payload."""
+    """Read one image and return a serialized fingerprint payload."""
+    extra: dict[str, t.Any] = {}

As per coding guidelines, "src/**/*.py: Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure ... Docstrings are present, accurate, and align with the implementation."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 201 - 207, The
docstring for _introspect is placed after executable code (extra dict
assignment) so _introspect.__doc__ is not set; move the triple-quoted docstring
to be the first statement immediately after the def _introspect(...) line
(before the assignment to extra) and remove the misplaced string later so the
function-level docstring is properly recognized; refer to the _introspect
function and the extra variable to locate and fix the placement.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/imgtools/cli/niftiindex.py`:
- Around line 72-76: Validate the parsed --n-jobs value before creating the
Crawler: ensure it is either -1 or a positive integer (>=1) and reject 0 or any
value < -1 with a clear CLI error. Implement the check where arguments are
parsed (the option using DEFAULT_WORKERS) and call parser.error (or raise
SystemExit with a message) if invalid so the user sees an immediate, descriptive
message instead of letting Crawler pass the bad value into joblib.Parallel;
reference the --n-jobs option, DEFAULT_WORKERS, and the Crawler/joblib.Parallel
handoff when adding the validation.

---

Duplicate comments:
In `@src/imgtools/cli/niftiindex.py`:
- Line 135: Remove the unnecessary f-string in the exception raise: change the
raise statement that currently does raise click.ClickException(f"Crawling
failed") from e to use a plain string like raise click.ClickException("Crawling
failed") from e; this touches the raise of click.ClickException in the
crawl/error handling path.

In `@src/imgtools/nifti/crawl/parse_niftis.py`:
- Around line 373-388: Load and inspect the JSON cache from crawl_cache_path and
validate its stored parameters (extensions, deep, metadata_path,
metadata_join_col, and any pattern-related fields) against the current call
parameters (e.g., the function's extensions list/tuple, deep flag,
metadata_path/metadata_join_col, and any pattern inputs) before returning the
cached ParseNiftiDirResult; if any of these don't match (types normalized:
compare tuples for extensions, booleans for deep, Path lists for metadata_path,
and strings/None for metadata_join_col), treat the cache as stale and fall
through to re-crawl instead of returning it. Ensure you keep the existing
symbols: index_csv_path, crawl_cache_path, ParseNiftiDirResult,
NIFTI_EXTENSIONS, force, deep, metadata_path, and metadata_join_col so reviewers
can locate and update the validation logic.
- Around line 201-207: The docstring for _introspect is placed after executable
code (extra dict assignment) so _introspect.__doc__ is not set; move the
triple-quoted docstring to be the first statement immediately after the def
_introspect(...) line (before the assignment to extra) and remove the misplaced
string later so the function-level docstring is properly recognized; refer to
the _introspect function and the extra variable to locate and fix the placement.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c975132 and 2f50b25.

⛔ Files ignored due to path filters (1)
  • .gitignore is excluded by none and included by none
📒 Files selected for processing (2)
  • src/imgtools/cli/niftiindex.py
  • src/imgtools/nifti/crawl/parse_niftis.py

Comment on lines +72 to +76
"--n-jobs",
type=int,
default=DEFAULT_WORKERS,
help="Number of jobs to use for parallel processing.",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the actual code at lines 70-130 in the file
cat -n src/imgtools/cli/niftiindex.py | sed -n '70,130p'

Repository: bhklab/med-imagetools

Length of output: 2436


🌐 Web query:

joblib.Parallel n_jobs=0 behavior rejection

💡 Result:

In joblib.Parallel, n_jobs=0 is invalid and is explicitly rejected: joblib raises a ValueError when n_jobs == 0. [1][2]

Use instead:

  • n_jobs=1 for sequential execution (no parallelism). [1]
  • n_jobs=None to mean “unset” (treated as 1 unless overridden via parallel_config). [1]
  • n_jobs=-1 to use all available CPUs (and -2, etc., for “all but one”, etc.). [1]

🏁 Script executed:

#!/bin/bash
# Find DEFAULT_WORKERS definition
rg "DEFAULT_WORKERS" src/imgtools/cli/niftiindex.py -B 2 -A 2

Repository: bhklab/med-imagetools

Length of output: 306


🏁 Script executed:

#!/bin/bash
# Check Crawler class to verify n_jobs is passed to joblib.Parallel
fd "crawl.py" src/imgtools/nifti/ --exec cat -n {} \; | head -100

Repository: bhklab/med-imagetools

Length of output: 47


🏁 Script executed:

#!/bin/bash
# Search for Crawler class and joblib usage
rg "class Crawler" src/imgtools/ -A 20

Repository: bhklab/med-imagetools

Length of output: 3479


🏁 Script executed:

#!/bin/bash
# Search for joblib.Parallel usage in crawl files
rg "Parallel" src/imgtools/ -B 3 -A 3

Repository: bhklab/med-imagetools

Length of output: 6374


Add CLI validation for --n-jobs parameter to reject invalid values early.

The CLI currently accepts any integer for --n-jobs (line 72-73) without validation. Invalid values like 0 or negative values below -1 are passed directly to the Crawler at line 123, which forwards them to joblib.Parallel. Since joblib.Parallel explicitly rejects n_jobs=0 and only accepts -1 (all CPUs) or positive integers, invalid input produces downstream errors instead of a clear CLI message.

Add validation before crawler execution:

Suggested fix
 def niftiindex(
@@
 ) -> None:
@@
     if metadata_path and not metadata_join_col:
         raise click.UsageError("--metadata-join-col is required when --metadata-path is set.")
+    if n_jobs == 0 or n_jobs < -1:
+        raise click.UsageError("--n-jobs must be -1 or a positive integer.")
@@
     crawler = Crawler(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/cli/niftiindex.py` around lines 72 - 76, Validate the parsed
--n-jobs value before creating the Crawler: ensure it is either -1 or a positive
integer (>=1) and reject 0 or any value < -1 with a clear CLI error. Implement
the check where arguments are parsed (the option using DEFAULT_WORKERS) and call
parser.error (or raise SystemExit with a message) if invalid so the user sees an
immediate, descriptive message instead of letting Crawler pass the bad value
into joblib.Parallel; reference the --n-jobs option, DEFAULT_WORKERS, and the
Crawler/joblib.Parallel handoff when adding the validation.

@jjjermiah
Copy link
Copy Markdown
Contributor

lgtm

…tern

- Introduced `shared_keys` to link masks to their referenced scans, enhancing metadata handling.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/imgtools/nifti/crawl/parse_niftis.py (1)

420-436: ⚠️ Potential issue | 🟠 Major

Cache reuse does not validate parameter compatibility.

The cache is loaded based solely on file existence, without verifying that the cached results match the current call parameters (scan_name_pattern, mask_name_pattern, extensions, metadata_path, metadata_join_col). If a user changes patterns but doesn't set force=True, they'll silently receive stale results.

🛠️ Suggested fix
     if not force and index_csv_path.exists() and crawl_cache_path.exists():
-        logger.warning(
-            "Loading cached crawl results, use force=True to re-crawl.", index_csv_path=str(index_csv_path)
-        )
-        index = pd.read_csv(index_csv_path)
         cache = json.loads(crawl_cache_path.read_text())
-        return ParseNiftiDirResult(
-            index=index,
-            index_csv_path=index_csv_path,
-            crawl_cache_path=crawl_cache_path,
-            unmatched_files=cache.get("unmatched_files", []),
-            extensions=tuple(cache.get("extensions", list(NIFTI_EXTENSIONS))),
-            deep=cache.get("deep", deep),
-            shared_keys=cache.get("shared_keys", []),
-            metadata_path=[Path(p) for p in cache.get("metadata_path", [])],
-            metadata_join_col=cache.get("metadata_join_col"),
-        )
+        resolved_extensions = _normalise_extensions(extensions)
+        cached_metadata_paths = [Path(p) for p in cache.get("metadata_path", [])]
+        cache_matches = (
+            cache.get("scan_name_pattern") == scan_name_pattern
+            and cache.get("mask_name_pattern") == mask_name_pattern
+            and tuple(cache.get("extensions", [])) == resolved_extensions
+            and cache.get("metadata_join_col") == metadata_join_col
+            and cached_metadata_paths == _normalise_metadata_paths(metadata_path)
+        )
+        if cache_matches:
+            logger.warning(
+                "Loading cached crawl results, use force=True to re-crawl.",
+                index_csv_path=str(index_csv_path),
+            )
+            index = pd.read_csv(index_csv_path)
+            return ParseNiftiDirResult(
+                index=index,
+                index_csv_path=index_csv_path,
+                crawl_cache_path=crawl_cache_path,
+                unmatched_files=cache.get("unmatched_files", []),
+                extensions=tuple(cache.get("extensions", list(NIFTI_EXTENSIONS))),
+                deep=cache.get("deep", deep),
+                shared_keys=cache.get("shared_keys", []),
+                metadata_path=cached_metadata_paths,
+                metadata_join_col=cache.get("metadata_join_col"),
+            )
+        else:
+            logger.info(
+                "Cache parameters mismatch, re-crawling.",
+                index_csv_path=str(index_csv_path),
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 420 - 436, When
reusing the cache in parse_niftis.py (the block that reads index_csv_path and
crawl_cache_path and returns ParseNiftiDirResult), validate that the cached
parameters match the current call parameters (at minimum scan_name_pattern,
mask_name_pattern, extensions, metadata_path, metadata_join_col and deep if
applicable) by reading the JSON into cache and comparing cache.get(...) values
to the local variables (e.g., scan_name_pattern, mask_name_pattern,
tuple(extensions), [Path(...) for metadata_path], metadata_join_col, deep); if
any mismatch is detected, log a warning indicating which parameter(s) differ and
skip returning the cached result (allow normal re-crawl unless force=True); keep
the existing behavior when they match.
🧹 Nitpick comments (2)
src/imgtools/nifti/crawl/parse_niftis.py (2)

507-519: Consider documenting behavior when metadata files have overlapping columns.

If multiple metadata files share column names (beyond the join column), pandas will create suffixed columns (_x, _y). This is standard behavior but may surprise users. A brief note in the docstring or a log warning when this occurs would improve clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 507 - 519, When
merging external metadata files (loop over metadata_paths using
_read_metadata_file and index.merge on metadata_join_col) detect overlapping
non-join column names before calling index.merge; compute overlaps =
(set(meta.columns) - {metadata_join_col}) ∩ (set(index.columns) -
{metadata_join_col}) and if non-empty emit a logger.warning listing the
overlapping column names and the metadata_path so users know pandas will create
suffixed columns (and optionally update the function/module docstring to note
this behavior).

260-264: Consider narrowing the exception type for clarity.

Catching bare Exception can mask unexpected programming errors. For image I/O operations, consider catching more specific exceptions like RuntimeError (from SimpleITK) or OSError.

♻️ Suggested refinement
     if deep:
         try:
             record.update(_introspect(fpath, file_type))
-        except Exception as e:
+        except (RuntimeError, OSError) as e:
             logger.error(f"Error reading image {fpath}: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 260 - 264, The current
except block in the try around _introspect(fpath, file_type) catches all
Exceptions; change it to catch specific I/O errors (e.g., except (RuntimeError,
OSError) as e:) so image-reading failures are logged via logger.error(f"Error
reading image {fpath}: {e}"), and let unexpected exceptions propagate (or
re-raise them) instead of swallowing them—locate the try/except that wraps
_introspect, fpath, file_type, record and update the exception handling
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/imgtools/nifti/crawl/parse_niftis.py`:
- Around line 203-228: Move the docstring so it immediately follows the
_introspect(...) function signature (before any statements) to be recognized as
the function docstring; also ensure img cannot be unbound by adding an explicit
branch or guard for unsupported file_type values (e.g., raise a ValueError if
file_type is not "scan" or "mask") so references to img (used to build extra via
img.serialized_fingerprint) are always valid; refer to symbols _introspect,
file_type, img, extra, MedImage, and Mask when making these edits.

---

Duplicate comments:
In `@src/imgtools/nifti/crawl/parse_niftis.py`:
- Around line 420-436: When reusing the cache in parse_niftis.py (the block that
reads index_csv_path and crawl_cache_path and returns ParseNiftiDirResult),
validate that the cached parameters match the current call parameters (at
minimum scan_name_pattern, mask_name_pattern, extensions, metadata_path,
metadata_join_col and deep if applicable) by reading the JSON into cache and
comparing cache.get(...) values to the local variables (e.g., scan_name_pattern,
mask_name_pattern, tuple(extensions), [Path(...) for metadata_path],
metadata_join_col, deep); if any mismatch is detected, log a warning indicating
which parameter(s) differ and skip returning the cached result (allow normal
re-crawl unless force=True); keep the existing behavior when they match.

---

Nitpick comments:
In `@src/imgtools/nifti/crawl/parse_niftis.py`:
- Around line 507-519: When merging external metadata files (loop over
metadata_paths using _read_metadata_file and index.merge on metadata_join_col)
detect overlapping non-join column names before calling index.merge; compute
overlaps = (set(meta.columns) - {metadata_join_col}) ∩ (set(index.columns) -
{metadata_join_col}) and if non-empty emit a logger.warning listing the
overlapping column names and the metadata_path so users know pandas will create
suffixed columns (and optionally update the function/module docstring to note
this behavior).
- Around line 260-264: The current except block in the try around
_introspect(fpath, file_type) catches all Exceptions; change it to catch
specific I/O errors (e.g., except (RuntimeError, OSError) as e:) so
image-reading failures are logged via logger.error(f"Error reading image
{fpath}: {e}"), and let unexpected exceptions propagate (or re-raise them)
instead of swallowing them—locate the try/except that wraps _introspect, fpath,
file_type, record and update the exception handling accordingly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f50b25 and 4082d03.

📒 Files selected for processing (1)
  • src/imgtools/nifti/crawl/parse_niftis.py

Comment thread src/imgtools/nifti/crawl/parse_niftis.py
@skim2257
Copy link
Copy Markdown
Collaborator

skim2257 commented Mar 3, 2026

@jjjermiah

lgtm

the prodigal son appears

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/imgtools/nifti/crawl/parse_niftis.py (2)

215-240: ⚠️ Potential issue | 🟠 Major

Move _introspect docstring to the first statement and guard invalid file_type.

Right now the string on Line 220 is not a function docstring, and img can be unbound if file_type is unexpected.

🛠️ Proposed fix
 def _introspect(
     fpath: Path,
     file_type: str,
 ) -> dict[str, t.Any]:
-    extra: dict[str, t.Any] = {}
-    """Read one image and return a serialized fingerprint payload."""
+    """Read one image and return a serialized fingerprint payload."""
+    extra: dict[str, t.Any] = {}
@@
     elif file_type == "mask":
         arr = sitk.GetArrayFromImage(sitk_img)
@@
             _sitk_img = sitk.GetImageFromArray(arr)
             _sitk_img.CopyInformation(sitk_img)
             img = Mask(_sitk_img, metadata={})
+    else:
+        raise ValueError(f"Unsupported file_type: {file_type!r}")

As per coding guidelines, "src/**/*.py: Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure ... Docstrings are present, accurate, and align with the implementation."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 215 - 240, The
_introspect function's docstring is placed after code and the variable img can
be unbound for unknown file_type; move the docstring to be the first statement
of _introspect and add an explicit guard for unexpected file_type values (e.g.,
raise ValueError) so img is always defined; update the branching around
file_type ("scan" and "mask") to handle only those cases and raise on others,
referencing the _introspect function, the file_type parameter, and the
img/Mask/MedImage branches when making the change.

432-448: ⚠️ Potential issue | 🟠 Major

Validate cache metadata before returning cached results.

This block returns cached output based only on file existence. If pattern/extensions/metadata args changed, stale index data can be returned silently.

🛠️ Proposed fix
     if not force and index_csv_path.exists() and crawl_cache_path.exists():
-        logger.warning(
-            "Loading cached crawl results, use force=True to re-crawl.", index_csv_path=str(index_csv_path)
-        )
-        index = pd.read_csv(index_csv_path)
         cache = json.loads(crawl_cache_path.read_text())
+        resolved_extensions = _normalise_extensions(extensions)
+        cache_matches = (
+            cache.get("scan_name_pattern") == scan_name_pattern
+            and cache.get("mask_name_pattern") == mask_name_pattern
+            and tuple(cache.get("extensions", [])) == resolved_extensions
+            and cache.get("metadata_join_col") == metadata_join_col
+            and [Path(p) for p in cache.get("metadata_path", [])] == metadata_paths
+        )
+        if cache_matches:
+            logger.warning(
+                "Loading cached crawl results, use force=True to re-crawl.",
+                index_csv_path=str(index_csv_path),
+            )
+            index = pd.read_csv(index_csv_path)
+            return ParseNiftiDirResult(
+                index=index,
+                index_csv_path=index_csv_path,
+                crawl_cache_path=crawl_cache_path,
+                unmatched_files=cache.get("unmatched_files", []),
+                extensions=tuple(cache.get("extensions", list(NIFTI_EXTENSIONS))),
+                deep=cache.get("deep", deep),
+                shared_keys=cache.get("shared_keys", []),
+                metadata_path=[Path(p) for p in cache.get("metadata_path", [])],
+                metadata_join_col=cache.get("metadata_join_col"),
+            )
+        logger.info("Cache exists but is incompatible with current parameters; re-crawling.")
-        return ParseNiftiDirResult(
-            index=index,
-            index_csv_path=index_csv_path,
-            crawl_cache_path=crawl_cache_path,
-            unmatched_files=cache.get("unmatched_files", []),
-            extensions=tuple(cache.get("extensions", list(NIFTI_EXTENSIONS))),
-            deep=cache.get("deep", deep),
-            shared_keys=cache.get("shared_keys", []),
-            metadata_path=[Path(p) for p in cache.get("metadata_path", [])],
-            metadata_join_col=cache.get("metadata_join_col"),
-        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/imgtools/nifti/crawl/parse_niftis.py` around lines 432 - 448, The
cached-return branch in ParseNiftiDirResult blindly trusts index_csv_path and
crawl_cache_path; update it to validate cached metadata (e.g., stored "pattern",
"extensions", "metadata_path"/"metadata_join_col", "deep" or other args used to
generate the index) against the current function arguments (force, pattern,
extensions/NIFTI_EXTENSIONS, deep, metadata_join_col) before returning. If any
key mismatches or is missing, treat the cache as stale: log a warning indicating
which keys differ, and proceed to re-run the crawl logic instead of returning
cached results; only return the cached ParseNiftiDirResult when all relevant
cached parameters match the current invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/imgtools/nifti/crawl/parse_niftis.py`:
- Around line 401-404: The Returns docstring for the function that returns
ParseNiftiDirResult is out of sync: add the missing shared_keys entry to the
Returns section so it matches the ParseNiftiDirResult NamedTuple. Update the
docstring to list shared_keys (type and brief description), keep ordering
consistent with ParseNiftiDirResult, and ensure wording follows the existing
style and PEP 257 conventions (e.g., short type phrase and one-line
description). Reference: ParseNiftiDirResult and the shared_keys field in
parse_niftis.py.

---

Duplicate comments:
In `@src/imgtools/nifti/crawl/parse_niftis.py`:
- Around line 215-240: The _introspect function's docstring is placed after code
and the variable img can be unbound for unknown file_type; move the docstring to
be the first statement of _introspect and add an explicit guard for unexpected
file_type values (e.g., raise ValueError) so img is always defined; update the
branching around file_type ("scan" and "mask") to handle only those cases and
raise on others, referencing the _introspect function, the file_type parameter,
and the img/Mask/MedImage branches when making the change.
- Around line 432-448: The cached-return branch in ParseNiftiDirResult blindly
trusts index_csv_path and crawl_cache_path; update it to validate cached
metadata (e.g., stored "pattern", "extensions",
"metadata_path"/"metadata_join_col", "deep" or other args used to generate the
index) against the current function arguments (force, pattern,
extensions/NIFTI_EXTENSIONS, deep, metadata_join_col) before returning. If any
key mismatches or is missing, treat the cache as stale: log a warning indicating
which keys differ, and proceed to re-run the crawl logic instead of returning
cached results; only return the cached ParseNiftiDirResult when all relevant
cached parameters match the current invocation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7ec822d7-d6c4-4a7a-b2d9-203e514d36a3

📥 Commits

Reviewing files that changed from the base of the PR and between 4082d03 and 81099d1.

📒 Files selected for processing (1)
  • src/imgtools/nifti/crawl/parse_niftis.py

Comment thread src/imgtools/nifti/crawl/parse_niftis.py
Comment thread src/imgtools/cli/__main__.py Outdated
Co-authored-by: Katy Scott <k.l.scott16@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants