fix: parse default dimension from model metadata string#307
fix: parse default dimension from model metadata string#307zc277584121 merged 1 commit intozilliztech:masterfrom
Conversation
Previously hardcoded 1024 for all variable-dimension models. Now parses the actual default from the string format (e.g., "512 (default), 128, 256" correctly returns 512 for voyage-4-nano).
There was a problem hiding this comment.
Pull request overview
This PR updates how VoyageAIEmbedding determines the embedding vector dimension when a model’s dimension metadata is represented as a string (variable-dimension models), avoiding a previously hardcoded default.
Changes:
- Parse the default dimension from the leading numeric token in
modelInfo.dimensionwhen it is a string. - Keep a fallback to
1024if parsing fails.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Parse default dimension from string like "1024 (default), 256, 512, 2048" | ||
| const match = modelInfo.dimension.match(/^(\d+)/); | ||
| this.dimension = match ? parseInt(match[1], 10) : 1024; |
There was a problem hiding this comment.
The PR description/test plan mentions fixing default dimensions for models like voyage-4-nano (512 default), but this file’s getSupportedModels() does not include that model (and all current string-form dimensions start with 1024). As a result, voyage-4-nano will still hit the unknown-model fallback and keep using 1024, so the reported schema mismatch would persist. Consider adding the missing model(s) to getSupportedModels() (with the correct dimension string) or introducing a safer fallback for unknown models (e.g., detect dimension from an embed response length).
zc277584121
left a comment
There was a problem hiding this comment.
Good catch. Parsing the default dimension from the string format (e.g., "1024 (default), 256, 512, 2048") is more robust than hardcoding 1024, especially now that voyage-4-nano has a different default (512).
LGTM.
Summary
Fix dimension detection for variable-dimension models. Previously hardcoded 1024 for all models with string-format dimensions. Now parses the actual default from the string.
Motivation
Models like voyage-4-nano have a default dimension of 512, not 1024. The string format
"512 (default), 128, 256"contains the correct default but was ignored in favor of a hardcoded 1024, causing schema mismatches when creating collections.Changes
packages/core/src/embedding/voyageai-embedding.ts— Parse leading number from dimension string with regex instead of hardcoding 1024Test plan
pnpm buildpasses