refactor(dataset): extract MultimodalContent base for SingleTurn/RandomPool by furionw · Pull Request #876 · ai-dynamo/aiperf

furionw · 2026-05-02T06:48:37Z

Summary

Extract MultimodalContent base class in dataset/loader/models.py holding the 8-field modality surface (text/texts, image/images, audio/audios, video/videos) plus the two shared validators (mutually-exclusive pairs and at-least-one-modality).
SingleTurn and RandomPool inherit from it; each retains its type discriminator and class-specific fields/validators (SingleTurn keeps timestamp/delay/role/session_id/output_length and a SingleTurn-only validate_timestamp_delay_exclusive validator).
No behavior change; all existing tests pass unchanged.
Sets up future modality additions to touch one class instead of two.

Picked option (A) for the timestamp/delay rule — keeping it as a separate validate_timestamp_delay_exclusive validator on SingleTurn rather than overriding the base validator. Pydantic v2 runs @model_validator(mode="after") validators independently, so the behavior is identical, the base validator's name stays truthful, and there's no super().validate_mutually_exclusive_fields() plumbing.

Test plan

`uv run pytest tests/unit/dataset/loader/ tests/unit/dataset/ -n auto` — green (1284 passed, identical to baseline)
`uv run pytest tests/unit/ -n auto` — same 8740 passed / 18 failed / 8 skipped / 20 errors as the unmodified branch tip; pre-existing flakes (timing/ZMQ/loop scheduler) unrelated to this refactor
`pre-commit run --files src/aiperf/dataset/loader/models.py` — all hooks pass; the only failure (`test-imports` flagging missing `trustme`) is pre-existing and reproduces on `main`

Summary by CodeRabbit

Refactor
- Consolidated multimodal validation logic across dataset entry types to reduce code duplication and improve system maintainability while preserving all existing functionality.

…omPool SingleTurn and RandomPool shared identical 8-field multimodal surface (text/texts/image/images/audio/audios/video/videos) and two identical validators (mutually-exclusive pairs, at-least-one-modality). Pull both into a `MultimodalContent` base; the entry classes inherit and add their type-specific fields on top. Picked option (A): the base validator covers only the 4 modality pairs, and SingleTurn adds a separate `validate_timestamp_delay_exclusive` validator for its own timestamp/delay exclusivity rule. Pydantic v2 runs validators independently, so behavior is preserved without inheritance gymnastics, and the base validator's name stays truthful (it really does only check modality pairs). No behavior change. All existing tests pass unchanged. Future modality additions touch one class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Qi Wang <qiwa@nvidia.com>

github-actions · 2026-05-02T06:48:44Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@5b6911719e3f0d0a5a4ec2a10352d788f1a99ff0

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@5b6911719e3f0d0a5a4ec2a10352d788f1a99ff0

Last updated for commit: 5b69117 • Browse code

coderabbitai · 2026-05-02T06:56:07Z

Walkthrough

A new MultimodalContent base model consolidates shared multimodal fields (text/texts, image/images, audio/audios, video/videos) and validation logic. SingleTurn and RandomPool now inherit from this base, eliminating duplicate field declarations and validators while retaining their specific logic.

Changes

Multimodal Base Model Inheritance Refactoring

Layer / File(s)	Summary
Base Model Definition `src/aiperf/dataset/loader/models.py` (lines 4–77)	New `MultimodalContent` class introduced with shared multimodal fields (`text`, `texts`, `image`, `images`, `audio`, `audios`, `video`, `videos`) and two validators: `validate_mutually_exclusive_fields` and `validate_at_least_one_modality`. Import of `Self` from `typing_extensions` added.
Model Inheritance Updates `src/aiperf/dataset/loader/models.py` (lines 79–158)	`SingleTurn` and `RandomPool` refactored to inherit from `MultimodalContent`. Duplicate modality fields and shared validators removed from both classes. `SingleTurn` retains `validate_timestamp_delay_exclusive`; `RandomPool` retains its dataset-type discriminator.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

Hopping through code, I spotted the strain—
Fields repeated again and again.
One base to rule them, with validators true,
Now SingleTurn and RandomPool shine like new! 🐰✨
Duplication vanquished, inheritance reigns supreme.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main refactoring: extracting a new MultimodalContent base class used by SingleTurn and RandomPool to eliminate duplication.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/aiperf/dataset/loader/models.py (1)
13-76: 💤 Low value

Clean extraction — MultimodalContent base is well-structured.

All modality fields carry Field(description="..."), both validators are correct, and the -> Self return types replace the old forward-reference string literals cleanly. The execution order (exclusivity check before at-least-one check) is preserved by Pydantic v2's definition-order guarantee for mode="after" validators.

One optional note: MultimodalContent is directly instantiable. Since its docstring already calls out that concrete classes must add their own type discriminator, and it is not present in the CustomDatasetT TypeVar bound, the misuse risk is low. If you want stronger enforcement you can add a @model_validator(mode="after") guard or mark it with an abstract type field — but it's not necessary here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aiperf/dataset/loader/models.py` around lines 13 - 76, MultimodalContent
can be instantiated directly; add a post-model validator on MultimodalContent
(e.g., def validate_has_type_discriminator(self) decorated with
`@model_validator`(mode="after")) that checks for a concrete type discriminator
(for example ensure getattr(self, "type", None) is not None or that
self.__class__ is not MultimodalContent) and raise a ValueError if missing, so
concrete subclasses must provide a `type` field or the base class cannot be used
directly; implement the validator in the MultimodalContent class (name it like
validate_has_type_discriminator) to enforce this.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aiperf/dataset/loader/models.py`:
- Around line 113-118: The exclusivity check in
validate_timestamp_delay_exclusive currently uses truthiness (if self.timestamp
and self.delay) which incorrectly treats 0/0.0 as False; change the condition to
explicitly test presence with "is not None" (e.g., if self.timestamp is not None
and self.delay is not None) so timestamp=0 or delay=0 are correctly considered
set, and keep the rest of the method (raising ValueError and returning self)
unchanged.

---

Nitpick comments:
In `@src/aiperf/dataset/loader/models.py`:
- Around line 13-76: MultimodalContent can be instantiated directly; add a
post-model validator on MultimodalContent (e.g., def
validate_has_type_discriminator(self) decorated with
`@model_validator`(mode="after")) that checks for a concrete type discriminator
(for example ensure getattr(self, "type", None) is not None or that
self.__class__ is not MultimodalContent) and raise a ValueError if missing, so
concrete subclasses must provide a `type` field or the base class cannot be used
directly; implement the validator in the MultimodalContent class (name it like
validate_has_type_discriminator) to enforce this.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 574ea801-bd9a-4f42-889c-435565af6ba7

📥 Commits

Reviewing files that changed from the base of the PR and between 8763c57 and 5b69117.

📒 Files selected for processing (1)

src/aiperf/dataset/loader/models.py

coderabbitai · 2026-05-02T06:56:10Z

+    @model_validator(mode="after")
+    def validate_timestamp_delay_exclusive(self) -> Self:
+        """Ensure timestamp and delay are not set together"""
+        if self.timestamp and self.delay:
+            raise ValueError("timestamp and delay cannot be set together")
+        return self


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Pre-existing: timestamp=0 / delay=0 bypass the exclusivity check.

if self.timestamp and self.delay: evaluates to False when either value is 0 or 0.0. A payload with {"timestamp": 0, "delay": 5, "text": "..."} would pass validation silently. If timestamp=0 (start of trace) is a valid business value, consider using is not None checks instead.

🛡️ Suggested fix

- if self.timestamp and self.delay: + if self.timestamp is not None and self.delay is not None:

This behaviour predates this PR; flagging here as it becomes visible in review of the changed lines.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiperf/dataset/loader/models.py` around lines 113 - 118, The exclusivity check in validate_timestamp_delay_exclusive currently uses truthiness (if self.timestamp and self.delay) which incorrectly treats 0/0.0 as False; change the condition to explicitly test presence with "is not None" (e.g., if self.timestamp is not None and self.delay is not None) so timestamp=0 or delay=0 are correctly considered set, and keep the rest of the method (raising ValueError and returning self) unchanged.

codecov · 2026-05-02T06:57:26Z

Codecov Report

❌ Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/aiperf/dataset/loader/models.py	88.23%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

github-actions Bot added the refactor label May 2, 2026

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(dataset): extract MultimodalContent base for SingleTurn/RandomPool#876

refactor(dataset): extract MultimodalContent base for SingleTurn/RandomPool#876
furionw wants to merge 1 commit intomainfrom
qiwa/multimodal_content

furionw commented May 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

coderabbitai Bot commented May 2, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 2, 2026

Uh oh!

codecov Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

furionw commented May 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

github-actions Bot commented May 2, 2026

Try out this PR

Uh oh!

coderabbitai Bot commented May 2, 2026

Walkthrough

Changes

Estimated Code Review Effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 2, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

furionw commented May 2, 2026 •

edited by coderabbitai Bot

Loading