Skip to content

Optimize nested term aggregations with a shared term ordinal cache#2900

Draft
fulmicoton wants to merge 1 commit intomainfrom
paul.masurel/2892-nested-term-agg-opt
Draft

Optimize nested term aggregations with a shared term ordinal cache#2900
fulmicoton wants to merge 1 commit intomainfrom
paul.masurel/2892-nested-term-agg-opt

Conversation

@fulmicoton
Copy link
Copy Markdown
Collaborator

When a string term aggregation is nested inside another bucket
aggregation, each parent bucket previously resolved term ordinals
to strings independently via sorted_ords_to_term_cb. This was
redundant since sibling buckets share the same term dictionary.

Introduce TermOrdToStrCache which resolves all observed term
ordinals once across all parent buckets, then stores the results
in a StringArena (a single contiguous String buffer) with
compact StringRef handles. The cache uses either a dense
Vec<Option<StringRef>> or a sparse FxHashMap<u64, StringRef>
depending on the ordinal range.

Also add collect_term_ords to the TermAggregationMap trait so
each map variant can export its live ordinals into a shared set.

Closes #2892

@fulmicoton fulmicoton marked this pull request as draft April 20, 2026 16:07
@PSeitz-dd PSeitz-dd force-pushed the paul.masurel/2892-nested-term-agg-opt branch from 5bd7fdd to 592b671 Compare April 21, 2026 09:43
When a string term aggregation is nested inside another bucket
aggregation, each parent bucket previously resolved term ordinals
to strings independently via `sorted_ords_to_term_cb`. This was
redundant since sibling buckets share the same term dictionary.

Introduce `TermOrdToStrCache` which resolves all observed term
ordinals once across all parent buckets, then stores the results
in a `StringArena` (a single contiguous String buffer) with
compact `StringRef` handles. The cache uses either a dense
`Vec<Option<StringRef>>` or a sparse `FxHashMap<u64, StringRef>`
depending on the ordinal range.

Also add `collect_term_ords` to the `TermAggregationMap` trait so
each map variant can export its live ordinals into a shared set.

Closes #2892
@fulmicoton-dd fulmicoton-dd force-pushed the paul.masurel/2892-nested-term-agg-opt branch from 592b671 to 4b131c3 Compare April 25, 2026 16:38
@fulmicoton
Copy link
Copy Markdown
Collaborator Author

This actually causes a regression, probably because the term ords do not get filtered out via cut_off_buckets anymore, including for the first level of aggregation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Apply optimization introduced in #2879 for other aggregations

2 participants