Skip to content

feat(minor): Enable cross-module specialisation of iterators + add Histogram.add(_:) merge API#30

Merged
hassila merged 1 commit intoHdrHistogram:mainfrom
hassila:inlinable-iterators-and-merge-api
Apr 24, 2026
Merged

feat(minor): Enable cross-module specialisation of iterators + add Histogram.add(_:) merge API#30
hassila merged 1 commit intoHdrHistogram:mainfrom
hassila:inlinable-iterators-and-merge-api

Conversation

@hassila
Copy link
Copy Markdown
Contributor

@hassila hassila commented Apr 24, 2026

Summary

  • Part 1 — @inlinable iterator hot paths. Enable cross-module generic specialisation for iterator next() and all supporting helpers so downstream consumers compiling against Histogram<Count> from another SwiftPM target can specialise on the concrete Count. Fixes the _swift_getGenericMetadata / MetadataCacheKey::operator== hot path that showed up in a profiled consumer.
  • Part 2 — new Histogram.add(_ other:) merge API. Direct element-wise bucket sum; the operation the motivating consumer was previously reaching for by iterating recordedValues() and calling record(_:count:) into the target.

Both changes are additive and ABI-compatible (no public type/method signature changed or removed).

Context

Profiling a downstream consumer on a 3.4 GB workload showed _swift_getGenericMetadata / swift::MetadataCacheKey::operator== dominating Histogram<UInt64>.RecordedValues.next() — roughly 6% of main-thread wall time. Annotating the caller with @_assemblyVision in release build produced:

Unable to specialize generic function "Histogram.Histogram.recordedValues()" since definition is not visible
Unable to specialize generic function "Histogram.Histogram.RecordedValues.next()" since definition is not visible

Root cause: Histogram<Count> is generic, but the iterator bodies are not @inlinable, so cross-module callers can't see through them and every iteration step falls back to runtime generic dispatch with metadata-cache lookups.

Changes in Sources/Histogram/Histogram.swift

Part 1 — @inlinable / @usableFromInline

  • IteratorImpl becomes @usableFromInline; every stored property and every method (init, hasNext, moveNext, makeIterationValueAndUpdatePrev, exhaustedSubBuckets, valueIteratedTo, incrementSubBucket) annotated accordingly.
  • Every iterator family — Percentiles, LinearBucketValues, LogarithmicBucketValues, RecordedValues, AllValues — has its stored properties marked @usableFromInline and its init / next() / private helpers marked @inlinable (with their private removed where needed so the inliner can see them).
  • Public accessors recordedValues(), allValues(), percentiles(_:), linearBucketValues(_:), logarithmicBucketValues(_:_:) marked @inlinable.
  • Transitive helpers reached from the iterator hot paths marked @inlinable (with @usableFromInline where previously internal, and private removed where present): highestEquivalentForValue, lowestEquivalentForValue, nextNonEquivalentForValue, sizeOfEquivalentRangeForValue, sizeOfEquivalentRangeFor, valueFromIndex, valueFrom, subBucketIndexForValue, bucketIndexForValue, countsIndexForValue, countsIndexFor.
  • Stored fields on Histogram that the above reach — bucketCount, subBucketMask, subBucketHalfCountMagnitude, leadingZeroCountBase, unitMagnitude — marked @usableFromInline.
  • Computed subBucketCount / subBucketHalfCount marked @usableFromInline with @inlinable get.
  • IterationValue gets an explicit @usableFromInline memberwise initializer (the synthesised one was internal), so makeIterationValueAndUpdatePrev can construct it from an @inlinable context.

Part 2 — add(_:)

New @inlinable public mutating func add(_ other: Histogram) that traps on shape mismatch (counts.count, numberOfSignificantValueDigits, lowestDiscernibleValue) and otherwise sums the count arrays element-wise (&+=) via withUnsafeMutableBufferPointer / withUnsafeBufferPointer. _totalCount is summed; maxValue and minNonZeroValue take the correct max/min.

Tests

New Tests/HistogramTests/HistogramMergeTests.swift:

  • testMergeRoundTrip — splits a mixed-value batch across two histograms, merges, and verifies totalCount, maxRecorded, minNonZero, valueAtPercentile at 0 / 25 / 50 / 75 / 90 / 99 / 99.9 / 100, and bucket-by-bucket equality, all against a reference histogram that recorded the same batch directly.
  • testAddEmptyOtherLeavesSelfUnchanged — merging an empty histogram leaves self equal to its pre-merge snapshot.
  • testAddSelfDoublesCountsa.add(a) doubles every bucket count (and totalCount), leaves max / min unchanged.
  • testShapeMismatchesAreObservable — documents and verifies each shape-mismatch input class (differing highestTrackableValue, numberOfSignificantValueDigits, lowestDiscernibleValue). A note explains that catching the precondition trap itself requires signal-handling test infra (CwlPreconditionTesting or similar) that the package doesn't currently depend on, so the test asserts that the guarded conditions really are differing shapes.

Full test run: 54 tests passed, 0 failures.

Verifying specialisation — @_assemblyVision before / after

Minimal downstream package depending on this one via local path, with:

import Histogram

@_assemblyVision
public func consume(_ h: Histogram<UInt64>) -> UInt64 {
    var total: UInt64 = 0
    for v in h.recordedValues() { total &+= UInt64(v.count) }
    return total
}

Built with swift build -c release 2>&1 | tee /tmp/av.txt, then grep "Unable to specialize" /tmp/av.txt:

Before

$ grep -c "Unable to specialize" /tmp/av-before.txt
80

$ grep "Unable to specialize" /tmp/av-before.txt | grep -oE '"[^"]+"' | sort -u
"Histogram.Histogram.AllValues.next()"
"Histogram.Histogram.LinearBucketValues.next()"
"Histogram.Histogram.LogarithmicBucketValues.next()"
"Histogram.Histogram.Percentiles.next()"
"Histogram.Histogram.RecordedValues.next()"
"Histogram.Histogram.allValues()"
"Histogram.Histogram.linearBucketValues(valueUnitsPerBucket:)"
"Histogram.Histogram.logarithmicBucketValues(valueUnitsInFirstBucket:logBase:)"
"Histogram.Histogram.percentiles(ticksPerHalfDistance:)"
"Histogram.Histogram.recordedValues()"

(All ten of the iterator accessors and their next() methods miss specialisation. 80 total emission points across the consumer's five @_assemblyVision functions.)

After

$ grep -c "Unable to specialize" /tmp/av-after.txt
0

$ grep "Specialized function" /tmp/av-after.txt | grep -oE '"[^"]+"' | sort -u
"Histogram.Histogram.AllValues.next()"
"Histogram.Histogram.LinearBucketValues.next()"
"Histogram.Histogram.LogarithmicBucketValues.next()"
"Histogram.Histogram.Percentiles.next()"
"Histogram.Histogram.RecordedValues.next()"
"Histogram.Histogram.add(_:)"
"Histogram.Histogram.allValues()"
"Histogram.Histogram.linearBucketValues(valueUnitsPerBucket:)"
"Histogram.Histogram.logarithmicBucketValues(valueUnitsInFirstBucket:logBase:)"
"Histogram.Histogram.percentiles(ticksPerHalfDistance:)"
"Histogram.Histogram.recordedValues()"
"Swift.Sequence<>.makeIterator()"

The previously-unspecialisable functions are now concretely specialised for Histogram<UInt64>. For example:

Specialized function "Histogram.Histogram.RecordedValues.next()" with type
  (@inout Histogram<UInt64>.RecordedValues) -> @out Optional<Histogram<UInt64>.IterationValue>
Specialized function "Histogram.Histogram.recordedValues()" with type
  (@guaranteed Histogram<UInt64>) -> @out Histogram<UInt64>.RecordedValues
Specialized function "Histogram.Histogram.add(_:)" with type
  (@in_guaranteed Histogram<UInt64>, @inout Histogram<UInt64>) -> ()

Scope notes

  • No public API type or signature changed. add(_:) is additive.
  • Storage layout of Histogram and IteratorImpl is unchanged — only attributes added.
  • @inlinable locks the implementation into the ABI surface; for these paths the algorithm is determined by the HDR histogram layout itself so that tradeoff seems fine.

Test plan

  • swift build clean
  • swift test — 54/54 pass
  • @_assemblyVision smoke: 80 → 0 "Unable to specialize" remarks; iterator next() specialisations now visible
  • Project CI (runs on PR open)

@hassila hassila requested a review from dimlio April 24, 2026 04:45
@hassila hassila changed the title Enable cross-module specialisation of iterators + add Histogram.add(_:) merge API feat(minor): Enable cross-module specialisation of iterators + add Histogram.add(_:) merge API Apr 24, 2026
@hassila hassila force-pushed the inlinable-iterators-and-merge-api branch 2 times, most recently from 61a4812 to be766c0 Compare April 24, 2026 05:46
….add(_:)

Part 1 — @inlinable iterator hot paths

Downstream consumers that import `Histogram` from a different SwiftPM
target previously could not specialise the iterator calls because the
iterator method bodies were not visible to the SIL optimiser. Building
with `@_assemblyVision` in a release consumer emitted:

  Unable to specialize generic function
  "Histogram.Histogram.recordedValues()" since definition is not visible
  Unable to specialize generic function
  "Histogram.Histogram.RecordedValues.next()" since definition is not visible

and equivalents for every iterator family. In a profiled 3.4 GB workload
this showed up as `_swift_getGenericMetadata` /
`MetadataCacheKey::operator==` dominating the iterator hot path
(roughly 6% of main-thread wall time).

Mark the iterator hot paths `@inlinable` and their supporting types /
stored properties / transitive helpers `@usableFromInline`, so the
specialiser can inline concrete-`Count` iteration through the module
boundary:

- `IteratorImpl` and all its stored properties and methods.
- `Percentiles`, `LinearBucketValues`, `LogarithmicBucketValues`,
  `RecordedValues`, `AllValues` (stored properties, init, next(), and
  the private helpers they reach).
- Public accessors `recordedValues()`, `allValues()`, `percentiles(...)`,
  `linearBucketValues(...)`, `logarithmicBucketValues(...)`.
- Transitive helpers: `highestEquivalentForValue`,
  `lowestEquivalentForValue`, `nextNonEquivalentForValue`,
  `sizeOfEquivalentRangeForValue`, `sizeOfEquivalentRangeFor`,
  `valueFromIndex`, `valueFrom`, `subBucketIndexForValue`,
  `bucketIndexForValue`, `countsIndexForValue`, `countsIndexFor`.
- Supporting stored fields on `Histogram`: `bucketCount`,
  `subBucketMask`, `subBucketHalfCountMagnitude`,
  `leadingZeroCountBase`, `unitMagnitude`.
- Computed `subBucketCount` and `subBucketHalfCount`.
- Explicit `@usableFromInline` memberwise initialiser on
  `IterationValue` (the synthesised one was internal), so
  `makeIterationValueAndUpdatePrev` can construct it from an
  `@inlinable` context.

Part 2 — new `Histogram.add(_:)` merge API

The workload that prompted Part 1 was actually trying to merge two
histograms by iterating `recordedValues()` on one and calling
`record(_:count:)` on the other — O(recorded-values) with a log-bucket
index per value, and suffered the specialisation miss. Merging two
histograms of matching layout is really an element-wise sum of their
backing count arrays, which is what this method does:

    @inlinable public mutating func add(_ other: Self)

Semantics, designed to match a replay of `other.recordedValues()` into
`self` via `record(_:count:)`:

- `numberOfSignificantValueDigits` and `lowestDiscernibleValue` must
  match on both sides — those determine the bucket layout.
- Layouts match, so `self` and `other` compute the same index for a
  given value. `other`'s nonzero buckets all live at indices ≤
  `countsIndexForValue(other.maxValue)`. The merge touches `self`'s
  public range state only when that index would not fit in `self`'s
  current `counts`:
    * If it fits (including the empty `other.maxValue == 0` case and
      the case where `other`'s backing array is longer but its
      recorded values all fit in `self`), only the common prefix is
      summed. `self.counts.count` and `self.highestTrackableValue` are
      not touched.
    * Otherwise, if `self.autoResize == true`, `self` grows just
      enough to index `other.maxValue` (not
      `other.highestTrackableValue`, which can be much larger if
      `other` was resized then reset).
    * Otherwise, the merge is a precondition failure — silently
      accepting would drop counts. This mirrors `record(_:)`, which
      accepts values above the nominal `highestTrackableValue` as long
      as their computed index fits in `counts` (subBucketCount is
      rounded up to a power of two, so there is headroom past the
      nominal ceiling).

Implementation uses `withUnsafeMutableBufferPointer` /
`withUnsafeBufferPointer` with `&+=` for a tight loop. `_totalCount`,
`maxValue`, and `minNonZeroValue` are updated accordingly.

Tests

- New `Tests/HistogramTests/HistogramMergeTests.swift` covering:
  round-trip merge (with bucket-by-bucket equality and percentile
  agreement), empty-other, self-add, receiver auto-resizes when
  `other.maxValue` doesn't fit, merging a resized-and-reset `other`
  into a fresh receiver is a strict no-op (asserts unchanged
  `highestTrackableValue` and `counts.count`, not just `==` which
  ignores those), merging a larger `other` that only recorded small
  values does not grow the auto-resizing receiver (mirroring a replay
  of `recordedValues()`), merging a shorter `other` into a longer
  `self` leaves self's tail untouched, merging a longer-but-in-range
  `other` into a fixed-size receiver succeeds, merging values above
  the nominal `highestTrackableValue` that still fit in the backing
  array succeeds, and a documentation test pinning the inputs that
  would trap the precondition (catching the trap itself requires
  signal-handling infra the package does not depend on).
- Un-skipped the pre-existing `testAutoSizingAdd` in
  `Tests/HistogramTests/HistogramAutosizingTests.swift` (was
  `throw XCTSkip("Histogram.add() is not implemented yet")`).
- Full suite: 60/60 pass.

Verifying specialisation

Minimal downstream SwiftPM target imports `Histogram` and annotates
its iterator-using functions with `@_assemblyVision`:

    swift build -c release 2>&1 | tee /tmp/av.txt
    grep "Unable to specialize" /tmp/av.txt

Before: 10 unique "Unable to specialize" remarks (80 emissions across
five iterator families) — every public iterator accessor and its
`next()`. After: 0. `grep "Specialized function"` now shows concrete
specialisations like `Histogram.Histogram.RecordedValues.next()` with
`(@inout Histogram<UInt64>.RecordedValues) -> @out
Optional<Histogram<UInt64>.IterationValue>` and
`Histogram.Histogram.add(_:)`.

Scope

Additive and ABI-compatible. No public type or method signature
changed. Storage layout of `Histogram` and `IteratorImpl` preserved —
only attributes added, plus the one new `add(_:)` method.
@hassila hassila force-pushed the inlinable-iterators-and-merge-api branch from be766c0 to fac12a0 Compare April 24, 2026 06:15
@hassila hassila merged commit c2e1210 into HdrHistogram:main Apr 24, 2026
8 of 9 checks passed
@hassila hassila deleted the inlinable-iterators-and-merge-api branch April 24, 2026 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants