feat(minor): Enable cross-module specialisation of iterators + add Histogram.add(_:) merge API#30
Merged
hassila merged 1 commit intoHdrHistogram:mainfrom Apr 24, 2026
Conversation
61a4812 to
be766c0
Compare
….add(_:)
Part 1 — @inlinable iterator hot paths
Downstream consumers that import `Histogram` from a different SwiftPM
target previously could not specialise the iterator calls because the
iterator method bodies were not visible to the SIL optimiser. Building
with `@_assemblyVision` in a release consumer emitted:
Unable to specialize generic function
"Histogram.Histogram.recordedValues()" since definition is not visible
Unable to specialize generic function
"Histogram.Histogram.RecordedValues.next()" since definition is not visible
and equivalents for every iterator family. In a profiled 3.4 GB workload
this showed up as `_swift_getGenericMetadata` /
`MetadataCacheKey::operator==` dominating the iterator hot path
(roughly 6% of main-thread wall time).
Mark the iterator hot paths `@inlinable` and their supporting types /
stored properties / transitive helpers `@usableFromInline`, so the
specialiser can inline concrete-`Count` iteration through the module
boundary:
- `IteratorImpl` and all its stored properties and methods.
- `Percentiles`, `LinearBucketValues`, `LogarithmicBucketValues`,
`RecordedValues`, `AllValues` (stored properties, init, next(), and
the private helpers they reach).
- Public accessors `recordedValues()`, `allValues()`, `percentiles(...)`,
`linearBucketValues(...)`, `logarithmicBucketValues(...)`.
- Transitive helpers: `highestEquivalentForValue`,
`lowestEquivalentForValue`, `nextNonEquivalentForValue`,
`sizeOfEquivalentRangeForValue`, `sizeOfEquivalentRangeFor`,
`valueFromIndex`, `valueFrom`, `subBucketIndexForValue`,
`bucketIndexForValue`, `countsIndexForValue`, `countsIndexFor`.
- Supporting stored fields on `Histogram`: `bucketCount`,
`subBucketMask`, `subBucketHalfCountMagnitude`,
`leadingZeroCountBase`, `unitMagnitude`.
- Computed `subBucketCount` and `subBucketHalfCount`.
- Explicit `@usableFromInline` memberwise initialiser on
`IterationValue` (the synthesised one was internal), so
`makeIterationValueAndUpdatePrev` can construct it from an
`@inlinable` context.
Part 2 — new `Histogram.add(_:)` merge API
The workload that prompted Part 1 was actually trying to merge two
histograms by iterating `recordedValues()` on one and calling
`record(_:count:)` on the other — O(recorded-values) with a log-bucket
index per value, and suffered the specialisation miss. Merging two
histograms of matching layout is really an element-wise sum of their
backing count arrays, which is what this method does:
@inlinable public mutating func add(_ other: Self)
Semantics, designed to match a replay of `other.recordedValues()` into
`self` via `record(_:count:)`:
- `numberOfSignificantValueDigits` and `lowestDiscernibleValue` must
match on both sides — those determine the bucket layout.
- Layouts match, so `self` and `other` compute the same index for a
given value. `other`'s nonzero buckets all live at indices ≤
`countsIndexForValue(other.maxValue)`. The merge touches `self`'s
public range state only when that index would not fit in `self`'s
current `counts`:
* If it fits (including the empty `other.maxValue == 0` case and
the case where `other`'s backing array is longer but its
recorded values all fit in `self`), only the common prefix is
summed. `self.counts.count` and `self.highestTrackableValue` are
not touched.
* Otherwise, if `self.autoResize == true`, `self` grows just
enough to index `other.maxValue` (not
`other.highestTrackableValue`, which can be much larger if
`other` was resized then reset).
* Otherwise, the merge is a precondition failure — silently
accepting would drop counts. This mirrors `record(_:)`, which
accepts values above the nominal `highestTrackableValue` as long
as their computed index fits in `counts` (subBucketCount is
rounded up to a power of two, so there is headroom past the
nominal ceiling).
Implementation uses `withUnsafeMutableBufferPointer` /
`withUnsafeBufferPointer` with `&+=` for a tight loop. `_totalCount`,
`maxValue`, and `minNonZeroValue` are updated accordingly.
Tests
- New `Tests/HistogramTests/HistogramMergeTests.swift` covering:
round-trip merge (with bucket-by-bucket equality and percentile
agreement), empty-other, self-add, receiver auto-resizes when
`other.maxValue` doesn't fit, merging a resized-and-reset `other`
into a fresh receiver is a strict no-op (asserts unchanged
`highestTrackableValue` and `counts.count`, not just `==` which
ignores those), merging a larger `other` that only recorded small
values does not grow the auto-resizing receiver (mirroring a replay
of `recordedValues()`), merging a shorter `other` into a longer
`self` leaves self's tail untouched, merging a longer-but-in-range
`other` into a fixed-size receiver succeeds, merging values above
the nominal `highestTrackableValue` that still fit in the backing
array succeeds, and a documentation test pinning the inputs that
would trap the precondition (catching the trap itself requires
signal-handling infra the package does not depend on).
- Un-skipped the pre-existing `testAutoSizingAdd` in
`Tests/HistogramTests/HistogramAutosizingTests.swift` (was
`throw XCTSkip("Histogram.add() is not implemented yet")`).
- Full suite: 60/60 pass.
Verifying specialisation
Minimal downstream SwiftPM target imports `Histogram` and annotates
its iterator-using functions with `@_assemblyVision`:
swift build -c release 2>&1 | tee /tmp/av.txt
grep "Unable to specialize" /tmp/av.txt
Before: 10 unique "Unable to specialize" remarks (80 emissions across
five iterator families) — every public iterator accessor and its
`next()`. After: 0. `grep "Specialized function"` now shows concrete
specialisations like `Histogram.Histogram.RecordedValues.next()` with
`(@inout Histogram<UInt64>.RecordedValues) -> @out
Optional<Histogram<UInt64>.IterationValue>` and
`Histogram.Histogram.add(_:)`.
Scope
Additive and ABI-compatible. No public type or method signature
changed. Storage layout of `Histogram` and `IteratorImpl` preserved —
only attributes added, plus the one new `add(_:)` method.
be766c0 to
fac12a0
Compare
dimlio
approved these changes
Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
next()and all supporting helpers so downstream consumers compiling againstHistogram<Count>from another SwiftPM target can specialise on the concreteCount. Fixes the_swift_getGenericMetadata/MetadataCacheKey::operator==hot path that showed up in a profiled consumer.Histogram.add(_ other:)merge API. Direct element-wise bucket sum; the operation the motivating consumer was previously reaching for by iteratingrecordedValues()and callingrecord(_:count:)into the target.Both changes are additive and ABI-compatible (no public type/method signature changed or removed).
Context
Profiling a downstream consumer on a 3.4 GB workload showed
_swift_getGenericMetadata/swift::MetadataCacheKey::operator==dominatingHistogram<UInt64>.RecordedValues.next()— roughly 6% of main-thread wall time. Annotating the caller with@_assemblyVisionin release build produced:Root cause:
Histogram<Count>is generic, but the iterator bodies are not@inlinable, so cross-module callers can't see through them and every iteration step falls back to runtime generic dispatch with metadata-cache lookups.Changes in
Sources/Histogram/Histogram.swiftPart 1 —
@inlinable/@usableFromInlineIteratorImplbecomes@usableFromInline; every stored property and every method (init,hasNext,moveNext,makeIterationValueAndUpdatePrev,exhaustedSubBuckets,valueIteratedTo,incrementSubBucket) annotated accordingly.Percentiles,LinearBucketValues,LogarithmicBucketValues,RecordedValues,AllValues— has its stored properties marked@usableFromInlineand itsinit/next()/ private helpers marked@inlinable(with theirprivateremoved where needed so the inliner can see them).recordedValues(),allValues(),percentiles(_:),linearBucketValues(_:),logarithmicBucketValues(_:_:)marked@inlinable.@inlinable(with@usableFromInlinewhere previously internal, andprivateremoved where present):highestEquivalentForValue,lowestEquivalentForValue,nextNonEquivalentForValue,sizeOfEquivalentRangeForValue,sizeOfEquivalentRangeFor,valueFromIndex,valueFrom,subBucketIndexForValue,bucketIndexForValue,countsIndexForValue,countsIndexFor.Histogramthat the above reach —bucketCount,subBucketMask,subBucketHalfCountMagnitude,leadingZeroCountBase,unitMagnitude— marked@usableFromInline.subBucketCount/subBucketHalfCountmarked@usableFromInlinewith@inlinable get.IterationValuegets an explicit@usableFromInlinememberwise initializer (the synthesised one was internal), somakeIterationValueAndUpdatePrevcan construct it from an@inlinablecontext.Part 2 —
add(_:)New
@inlinable public mutating func add(_ other: Histogram)that traps on shape mismatch (counts.count,numberOfSignificantValueDigits,lowestDiscernibleValue) and otherwise sums the count arrays element-wise (&+=) viawithUnsafeMutableBufferPointer/withUnsafeBufferPointer._totalCountis summed;maxValueandminNonZeroValuetake the correct max/min.Tests
New
Tests/HistogramTests/HistogramMergeTests.swift:testMergeRoundTrip— splits a mixed-value batch across two histograms, merges, and verifiestotalCount,maxRecorded,minNonZero,valueAtPercentileat 0 / 25 / 50 / 75 / 90 / 99 / 99.9 / 100, and bucket-by-bucket equality, all against a reference histogram that recorded the same batch directly.testAddEmptyOtherLeavesSelfUnchanged— merging an empty histogram leavesselfequal to its pre-merge snapshot.testAddSelfDoublesCounts—a.add(a)doubles every bucket count (andtotalCount), leaves max / min unchanged.testShapeMismatchesAreObservable— documents and verifies each shape-mismatch input class (differinghighestTrackableValue,numberOfSignificantValueDigits,lowestDiscernibleValue). A note explains that catching thepreconditiontrap itself requires signal-handling test infra (CwlPreconditionTesting or similar) that the package doesn't currently depend on, so the test asserts that the guarded conditions really are differing shapes.Full test run: 54 tests passed, 0 failures.
Verifying specialisation —
@_assemblyVisionbefore / afterMinimal downstream package depending on this one via local path, with:
Built with
swift build -c release 2>&1 | tee /tmp/av.txt, thengrep "Unable to specialize" /tmp/av.txt:Before
(All ten of the iterator accessors and their
next()methods miss specialisation. 80 total emission points across the consumer's five@_assemblyVisionfunctions.)After
The previously-unspecialisable functions are now concretely specialised for
Histogram<UInt64>. For example:Scope notes
add(_:)is additive.HistogramandIteratorImplis unchanged — only attributes added.@inlinablelocks the implementation into the ABI surface; for these paths the algorithm is determined by the HDR histogram layout itself so that tradeoff seems fine.Test plan
swift buildcleanswift test— 54/54 pass@_assemblyVisionsmoke: 80 → 0 "Unable to specialize" remarks; iteratornext()specialisations now visible