refactor(dataset): extract MultimodalContent base for SingleTurn/RandomPool#876
refactor(dataset): extract MultimodalContent base for SingleTurn/RandomPool#876
Conversation
…omPool SingleTurn and RandomPool shared identical 8-field multimodal surface (text/texts/image/images/audio/audios/video/videos) and two identical validators (mutually-exclusive pairs, at-least-one-modality). Pull both into a `MultimodalContent` base; the entry classes inherit and add their type-specific fields on top. Picked option (A): the base validator covers only the 4 modality pairs, and SingleTurn adds a separate `validate_timestamp_delay_exclusive` validator for its own timestamp/delay exclusivity rule. Pydantic v2 runs validators independently, so behavior is preserved without inheritance gymnastics, and the base validator's name stays truthful (it really does only check modality pairs). No behavior change. All existing tests pass unchanged. Future modality additions touch one class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Qi Wang <qiwa@nvidia.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@5b6911719e3f0d0a5a4ec2a10352d788f1a99ff0Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@5b6911719e3f0d0a5a4ec2a10352d788f1a99ff0Last updated for commit: |
WalkthroughA new ChangesMultimodal Base Model Inheritance Refactoring
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/aiperf/dataset/loader/models.py (1)
13-76: 💤 Low valueClean extraction —
MultimodalContentbase is well-structured.All modality fields carry
Field(description="..."), both validators are correct, and the-> Selfreturn types replace the old forward-reference string literals cleanly. The execution order (exclusivity check before at-least-one check) is preserved by Pydantic v2's definition-order guarantee formode="after"validators.One optional note:
MultimodalContentis directly instantiable. Since its docstring already calls out that concrete classes must add their owntypediscriminator, and it is not present in theCustomDatasetTTypeVar bound, the misuse risk is low. If you want stronger enforcement you can add a@model_validator(mode="after")guard or mark it with an abstracttypefield — but it's not necessary here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/aiperf/dataset/loader/models.py` around lines 13 - 76, MultimodalContent can be instantiated directly; add a post-model validator on MultimodalContent (e.g., def validate_has_type_discriminator(self) decorated with `@model_validator`(mode="after")) that checks for a concrete type discriminator (for example ensure getattr(self, "type", None) is not None or that self.__class__ is not MultimodalContent) and raise a ValueError if missing, so concrete subclasses must provide a `type` field or the base class cannot be used directly; implement the validator in the MultimodalContent class (name it like validate_has_type_discriminator) to enforce this.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/aiperf/dataset/loader/models.py`:
- Around line 113-118: The exclusivity check in
validate_timestamp_delay_exclusive currently uses truthiness (if self.timestamp
and self.delay) which incorrectly treats 0/0.0 as False; change the condition to
explicitly test presence with "is not None" (e.g., if self.timestamp is not None
and self.delay is not None) so timestamp=0 or delay=0 are correctly considered
set, and keep the rest of the method (raising ValueError and returning self)
unchanged.
---
Nitpick comments:
In `@src/aiperf/dataset/loader/models.py`:
- Around line 13-76: MultimodalContent can be instantiated directly; add a
post-model validator on MultimodalContent (e.g., def
validate_has_type_discriminator(self) decorated with
`@model_validator`(mode="after")) that checks for a concrete type discriminator
(for example ensure getattr(self, "type", None) is not None or that
self.__class__ is not MultimodalContent) and raise a ValueError if missing, so
concrete subclasses must provide a `type` field or the base class cannot be used
directly; implement the validator in the MultimodalContent class (name it like
validate_has_type_discriminator) to enforce this.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 574ea801-bd9a-4f42-889c-435565af6ba7
📒 Files selected for processing (1)
src/aiperf/dataset/loader/models.py
| @model_validator(mode="after") | ||
| def validate_timestamp_delay_exclusive(self) -> Self: | ||
| """Ensure timestamp and delay are not set together""" | ||
| if self.timestamp and self.delay: | ||
| raise ValueError("timestamp and delay cannot be set together") | ||
| return self |
There was a problem hiding this comment.
Pre-existing: timestamp=0 / delay=0 bypass the exclusivity check.
if self.timestamp and self.delay: evaluates to False when either value is 0 or 0.0. A payload with {"timestamp": 0, "delay": 5, "text": "..."} would pass validation silently. If timestamp=0 (start of trace) is a valid business value, consider using is not None checks instead.
🛡️ Suggested fix
- if self.timestamp and self.delay:
+ if self.timestamp is not None and self.delay is not None:This behaviour predates this PR; flagging here as it becomes visible in review of the changed lines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/aiperf/dataset/loader/models.py` around lines 113 - 118, The exclusivity
check in validate_timestamp_delay_exclusive currently uses truthiness (if
self.timestamp and self.delay) which incorrectly treats 0/0.0 as False; change
the condition to explicitly test presence with "is not None" (e.g., if
self.timestamp is not None and self.delay is not None) so timestamp=0 or delay=0
are correctly considered set, and keep the rest of the method (raising
ValueError and returning self) unchanged.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Summary
MultimodalContentbase class indataset/loader/models.pyholding the 8-field modality surface (text/texts, image/images, audio/audios, video/videos) plus the two shared validators (mutually-exclusive pairs and at-least-one-modality).SingleTurnandRandomPoolinherit from it; each retains itstypediscriminator and class-specific fields/validators (SingleTurnkeepstimestamp/delay/role/session_id/output_lengthand a SingleTurn-onlyvalidate_timestamp_delay_exclusivevalidator).Picked option (A) for the timestamp/delay rule — keeping it as a separate
validate_timestamp_delay_exclusivevalidator onSingleTurnrather than overriding the base validator. Pydantic v2 runs@model_validator(mode="after")validators independently, so the behavior is identical, the base validator's name stays truthful, and there's nosuper().validate_mutually_exclusive_fields()plumbing.Test plan
Summary by CodeRabbit