Optimize nested term aggregations with a shared term ordinal cache#2900
Draft
fulmicoton wants to merge 1 commit intomainfrom
Draft
Optimize nested term aggregations with a shared term ordinal cache#2900fulmicoton wants to merge 1 commit intomainfrom
fulmicoton wants to merge 1 commit intomainfrom
Conversation
5bd7fdd to
592b671
Compare
When a string term aggregation is nested inside another bucket aggregation, each parent bucket previously resolved term ordinals to strings independently via `sorted_ords_to_term_cb`. This was redundant since sibling buckets share the same term dictionary. Introduce `TermOrdToStrCache` which resolves all observed term ordinals once across all parent buckets, then stores the results in a `StringArena` (a single contiguous String buffer) with compact `StringRef` handles. The cache uses either a dense `Vec<Option<StringRef>>` or a sparse `FxHashMap<u64, StringRef>` depending on the ordinal range. Also add `collect_term_ords` to the `TermAggregationMap` trait so each map variant can export its live ordinals into a shared set. Closes #2892
592b671 to
4b131c3
Compare
Collaborator
Author
|
This actually causes a regression, probably because the term ords do not get filtered out via |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a string term aggregation is nested inside another bucket
aggregation, each parent bucket previously resolved term ordinals
to strings independently via
sorted_ords_to_term_cb. This wasredundant since sibling buckets share the same term dictionary.
Introduce
TermOrdToStrCachewhich resolves all observed termordinals once across all parent buckets, then stores the results
in a
StringArena(a single contiguous String buffer) withcompact
StringRefhandles. The cache uses either a denseVec<Option<StringRef>>or a sparseFxHashMap<u64, StringRef>depending on the ordinal range.
Also add
collect_term_ordsto theTermAggregationMaptrait soeach map variant can export its live ordinals into a shared set.
Closes #2892