Skip to content

fix: parse default dimension from model metadata string#307

Merged
zc277584121 merged 1 commit intozilliztech:masterfrom
BeamNawapat:feat/dimension-parsing
Apr 22, 2026
Merged

fix: parse default dimension from model metadata string#307
zc277584121 merged 1 commit intozilliztech:masterfrom
BeamNawapat:feat/dimension-parsing

Conversation

@BeamNawapat
Copy link
Copy Markdown
Contributor

Summary

Fix dimension detection for variable-dimension models. Previously hardcoded 1024 for all models with string-format dimensions. Now parses the actual default from the string.

Motivation

Models like voyage-4-nano have a default dimension of 512, not 1024. The string format "512 (default), 128, 256" contains the correct default but was ignored in favor of a hardcoded 1024, causing schema mismatches when creating collections.

Changes

  • packages/core/src/embedding/voyageai-embedding.ts — Parse leading number from dimension string with regex instead of hardcoding 1024

Test plan

  • pnpm build passes
  • Models with 1024 default still return 1024
  • Models with 512 default (e.g., voyage-4-nano) correctly return 512

Previously hardcoded 1024 for all variable-dimension models. Now
parses the actual default from the string format (e.g., "512 (default),
128, 256" correctly returns 512 for voyage-4-nano).
Copilot AI review requested due to automatic review settings April 21, 2026 03:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates how VoyageAIEmbedding determines the embedding vector dimension when a model’s dimension metadata is represented as a string (variable-dimension models), avoiding a previously hardcoded default.

Changes:

  • Parse the default dimension from the leading numeric token in modelInfo.dimension when it is a string.
  • Keep a fallback to 1024 if parsing fails.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +33 to +35
// Parse default dimension from string like "1024 (default), 256, 512, 2048"
const match = modelInfo.dimension.match(/^(\d+)/);
this.dimension = match ? parseInt(match[1], 10) : 1024;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/test plan mentions fixing default dimensions for models like voyage-4-nano (512 default), but this file’s getSupportedModels() does not include that model (and all current string-form dimensions start with 1024). As a result, voyage-4-nano will still hit the unknown-model fallback and keep using 1024, so the reported schema mismatch would persist. Consider adding the missing model(s) to getSupportedModels() (with the correct dimension string) or introducing a safer fallback for unknown models (e.g., detect dimension from an embed response length).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@zc277584121 zc277584121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Parsing the default dimension from the string format (e.g., "1024 (default), 256, 512, 2048") is more robust than hardcoding 1024, especially now that voyage-4-nano has a different default (512).

LGTM.

@zc277584121 zc277584121 merged commit 1ebda84 into zilliztech:master Apr 22, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants