Skip to content

perf: use Map for staticChildren to avoid megamorphic lookups#421

Open
joshuaisaact wants to merge 1 commit into
delvedor:mainfrom
joshuaisaact:perf/charcode-indexed-static-children
Open

perf: use Map for staticChildren to avoid megamorphic lookups#421
joshuaisaact wants to merge 1 commit into
delvedor:mainfrom
joshuaisaact:perf/charcode-indexed-static-children

Conversation

@joshuaisaact
Copy link
Copy Markdown

@joshuaisaact joshuaisaact commented Mar 29, 2026

AI disclosure: This change was developed with Claude Code (Opus) using auto-claude, an automated experiment loop that edits code, benchmarks, and keeps or reverts changes. The profiling insight and initial fix (a sparse charCode-indexed array with a secondary list to preserve iteration order) emerged from 14 experiments — 8 kept, 6 reverted. I simplified it to a Map during review, which gives the same performance win with a much cleaner diff.

All code reviewed and understood by a human. Happy to close if this doesn't meet the bar.

Why

Profiling find() with --prof shows KeyedLoadIC_Megamorphic consuming ~14% of total ticks on deep tree lookups (e.g. long static + parametric routes). This happens because staticChildren is a plain object with dynamic single-character string keys, and every node has a different set of keys. V8 can't build a stable inline cache for obj[path.charAt(i)] when the object shapes vary across nodes, so it falls back to a slow hash-table lookup every time.

What

Replace the staticChildren plain object with a Map keyed by character code. Map.get() is a single monomorphic call regardless of stored keys, which eliminates the megamorphic IC entirely. Maps also preserve insertion order, so prettyPrint iteration works without any extra bookkeeping.

The diff is +11/-10 across three files with no API or behavioral changes.

Benchmarks

Focused run on the 5 slowest cases (500 min samples):

main:
lookup long static + parametric route.............. x 5,560,904 ops/sec ±0.10% (595 runs sampled)
lookup long static route (common prefix)........... x 6,579,443 ops/sec ±0.13% (593 runs sampled)
lookup long static route........................... x 8,747,779 ops/sec ±0.20% (594 runs sampled)
lookup short parametric route (encoded optimized).. x 9,008,336 ops/sec ±0.24% (595 runs sampled)
lookup multi-parametric route with two short params x 9,183,080 ops/sec ±0.24% (594 runs sampled)

branch:
lookup long static + parametric route.............. x 6,266,005 ops/sec ±0.24% (594 runs sampled)
lookup long static route (common prefix)........... x 7,678,760 ops/sec ±0.16% (592 runs sampled)
lookup long static route........................... x 8,555,547 ops/sec ±0.18% (595 runs sampled)
lookup short parametric route (encoded optimized).. x 8,902,918 ops/sec ±0.16% (595 runs sampled)
lookup multi-parametric route with two short params x 8,723,864 ops/sec ±0.24% (593 runs sampled)

+13% on the slowest case, +17% on common prefix. Other cases within noise.

Full experiment log (14 experiments)
# Result Description
1 revert Pool arrays with .length=0 — regression ~30% across all cases
2 keep charCode-indexed array for staticChildren — +16% on slowest case
3 keep Flatten backtracking stack to avoid object allocation — +2.9% common prefix
4 keep Inline findStaticMatchingChild into getNextNode — +6.2% on slowest
5 revert charCodeAt loop replacing indexOf — multi-param regressed 8%
6 revert Prototype matchPrefix replacing compiled new Function — long static regressed 25%
7 keep Fast static chain skipping getNextNode for pure-static nodes — +5.4% common prefix
8 keep Reuse safeDecodeURI result object
9 keep Move try/catch out of find() into safeDecodeURI — +3.1% on slowest
10 revert indexOf('%') fast path in safeDecodeURI — all cases regressed 5-9%
11 keep Pre-allocate params array with index counter — +4.4% encoded optimized
12 keep pureStaticNode flag — +3.7% common prefix
13 revert Reuse find() result object — all cases regressed 5-7% (V8 escape analysis was already eliminating it)
14 revert Inline safeDecodeURI into find() — function too large for TurboFan optimization budget

Only experiment 2 (the core insight) is included in this PR. The rest were explored but either had tradeoffs or were additive optimizations on top.

References

@joshuaisaact joshuaisaact marked this pull request as ready for review March 29, 2026 15:21
Replace the staticChildren plain object with a Map keyed by character code.
Plain object property access with dynamic string keys causes V8 to fall into
megamorphic KeyedLoadIC when nodes have different sets of child keys. Map.get()
is a single monomorphic call regardless of stored keys, and Maps preserve
insertion order for iteration.
@joshuaisaact joshuaisaact force-pushed the perf/charcode-indexed-static-children branch from ec97dfd to 0231f01 Compare March 29, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants