Skip to content

Improve evmasm block deduplicator performance#16684

Open
clonker wants to merge 4 commits into
developfrom
block-deduplicator-perf
Open

Improve evmasm block deduplicator performance#16684
clonker wants to merge 4 commits into
developfrom
block-deduplicator-perf

Conversation

@clonker
Copy link
Copy Markdown
Member

@clonker clonker commented May 7, 2026

This is mainly a refactor of the BlockDeduplicator and no behavior change is intended.

Right now the deduplicator uses a std::set with a lexicographical compare over each block's body. Looking up a block is therefore O(block_size * log N) per probe and each comparison walks both blocks assembly item by assembly item (worst case the entire block).

The refactor exchanges the set with an unordered_set provided a specific hash function and a specific equality function. Each block's body is hashed once via hash_range and equality checks use ranges::equal over the same range. Lookup is then O(block_size) amortized.

To achieve this,

  • AssemblyItem gets a hash_value friend so that boost hash can load a hash via ADL
  • BlockIterator gets default ctor and post-increment to satisfy the range concept (needed for hash_range and ranges::equal).

Some benchmarks on my machine:

  • wall time decreases significantly (4.5 - 10%) for evmasm pipeline
  • wall time decreases for solady (5.9%), smaller gains for OZ (1.3%), prb-math and forge-std show a slight increase but both are inside their standard deviations so this is basically noise
  • peak RSS is flat so no memory blow up
  • instructions drop everywhere, up to 13.6%
Benchmark           Pipeline  Metric          Base                             Target                           Delta  
------------------  --------  --------------  -------------------------------  -------------------------------  -------
openzeppelin-5.6.1  evmasm    cpu_time        9.39s ± 0.04s                    8.97s ± 0.05s                    -4.5%  
                              creation_size   725,287 ± 0                      725,287 ± 0                      0.0%   
                              cycles          51,143,785,884 ± 231,191,619     48,970,472,312 ± 237,550,537     -4.25% 
                              instructions    139,854,492,021 ± 2,026,342      129,273,536,861 ± 2,343,796      -7.57% 
                              peak_rss        1079 ± 1 MiB                     1079 ± 1 MiB                     -0.01% 
                              runtime_size    650,752 ± 0                      650,752 ± 0                      0.0%   
                              wall_time       9.44s ± 0.04s                    9.02s ± 0.05s                    -4.47% 
                                                                                                                       
openzeppelin-5.6.1  ir        cpu_time        27.12s ± 0.26s                   26.76s ± 0.04s                   -1.33% 
                              creation_size   675,405 ± 0                      675,405 ± 0                      0.0%   
                              cycles          149,803,686,253 ± 1,129,014,471  148,069,507,225 ± 292,395,229    -1.16% 
                              instructions    320,431,242,127 ± 940,555        313,064,278,224 ± 1,706,536      -2.3%  
                              peak_rss        1469 ± 1 MiB                     1470 ± 1 MiB                     +0.07% 
                              runtime_size    598,355 ± 0                      598,355 ± 0                      0.0%   
                              wall_time       27.23s ± 0.26s                   26.87s ± 0.03s                   -1.34% 
                                                                                                                       
solady-0.1.26       evmasm    cpu_time        13.18s ± 0.04s                   11.89s ± 0.04s                   -9.85% 
                              creation_size   1,514,791 ± 0                    1,514,791 ± 0                    0.0%   
                              cycles          71,650,496,789 ± 154,698,769     64,819,441,832 ± 261,942,806     -9.53% 
                              instructions    214,144,937,711 ± 3,539,478      184,840,171,275 ± 4,082,548      -13.68%
                              peak_rss        1828 ± 1 MiB                     1828 ± 1 MiB                     +0.04% 
                              runtime_size    1,480,154 ± 0                    1,480,154 ± 0                    0.0%   
                              wall_time       13.25s ± 0.05s                   11.95s ± 0.04s                   -9.76% 
                                                                                                                       
solady-0.1.26       ir        cpu_time        70.88s ± 0.42s                   66.68s ± 0.28s                   -5.93% 
                              creation_size   1,681,895 ± 0                    1,681,895 ± 0                    0.0%   
                              cycles          394,580,301,731 ± 2,089,906,480  371,914,985,508 ± 1,503,526,815  -5.74% 
                              instructions    781,103,461,292 ± 2,181,497      677,557,956,149 ± 2,911,948      -13.26%
                              peak_rss        2953 ± 2 MiB                     2954 ± 3 MiB                     +0.02% 
                              runtime_size    1,650,497 ± 0                    1,650,497 ± 0                    0.0%   
                              wall_time       71.13s ± 0.42s                   66.93s ± 0.28s                   -5.91% 
                                                                                                                       
prb-math-4.1.1      evmasm    cpu_time        3.05s ± 0.01s                    2.89s ± 0.02s                    -5.21% 
                              creation_size   252,039 ± 0                      252,039 ± 0                      0.0%   
                              cycles          15,873,797,368 ± 32,906,950      15,114,115,394 ± 90,515,113      -4.79% 
                              instructions    49,362,879,554 ± 98,346          45,721,290,356 ± 71,266          -7.38% 
                              peak_rss        1173 ± 1 MiB                     1173 ± 1 MiB                     +0.01% 
                              runtime_size    250,178 ± 0                      250,178 ± 0                      0.0%   
                              wall_time       3.06s ± 0.01s                    2.90s ± 0.02s                    -5.26% 
                                                                                                                       
prb-math-4.1.1      ir        cpu_time        13.45s ± 0.11s                   13.45s ± 0.02s                   +0.05% 
                              creation_size   262,432 ± 0                      262,432 ± 0                      0.0%   
                              cycles          73,887,147,790 ± 586,549,982     73,871,375,642 ± 171,688,911     -0.02% 
                              instructions    154,542,673,005 ± 321,635        152,792,570,586 ± 478,006        -1.13% 
                              peak_rss        1499 ± 1 MiB                     1492 ± 0 MiB                     -0.51% 
                              runtime_size    260,832 ± 0                      260,832 ± 0                      0.0%   
                              wall_time       13.50s ± 0.12s                   13.50s ± 0.02s                   +0.07% 
                                                                                                                       
forge-std-1.16.1    evmasm    cpu_time        4.13s ± 0.03s                    3.81s ± 0.03s                    -7.75% 
                              creation_size   394,603 ± 0                      394,603 ± 0                      0.0%   
                              cycles          22,359,589,881 ± 114,043,632     20,591,056,394 ± 151,487,100     -7.91% 
                              instructions    63,078,580,181 ± 95,694          56,220,570,242 ± 75,356          -10.87%
                              peak_rss        534 ± 1 MiB                      533 ± 2 MiB                      -0.18% 
                              runtime_size    381,729 ± 0                      381,729 ± 0                      0.0%   
                              wall_time       4.15s ± 0.03s                    3.82s ± 0.03s                    -7.75% 
                                                                                                                       
forge-std-1.16.1    ir        cpu_time        17.70s ± 0.36s                   17.79s ± 0.14s                   +0.47% 
                              creation_size   422,073 ± 0                      422,073 ± 0                      0.0%   
                              cycles          98,206,831,939 ± 1,885,776,060   98,726,395,115 ± 764,655,639     +0.53% 
                              instructions    187,717,062,052 ± 162,869        181,186,753,902 ± 568,894        -3.48% 
                              peak_rss        833 ± 1 MiB                      833 ± 0 MiB                      +0.02% 
                              runtime_size    406,750 ± 0                      406,750 ± 0                      0.0%   
                              wall_time       17.77s ± 0.36s                   17.86s ± 0.14s                   +0.49% 
plot

@clonker clonker force-pushed the block-deduplicator-perf branch from 2be13ff to 462f621 Compare May 7, 2026 08:43
@argotorg argotorg deleted a comment from github-actions Bot May 7, 2026
@clonker clonker marked this pull request as ready for review May 7, 2026 08:48
Comment thread libevmasm/AssemblyItem.h
Comment on lines +238 to +250
/// Hash function compatible with `operator==`. Found via ADL by `boost::hash`.
friend std::size_t hash_value(AssemblyItem const& _item)
{
std::size_t hash = 0;
boost::hash_combine(hash, static_cast<int>(_item.m_type));
if (_item.m_type == Operation)
boost::hash_combine(hash, static_cast<int>(_item.instruction()));
else if (_item.m_type == VerbatimBytecode)
boost::hash_combine(hash, *_item.m_verbatimBytecode);
else
boost::hash_combine(hash, _item.data());
return hash;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK, I would just move it somewhere else.
Placing it between operator!= and operator< does not seem like the best choice.

Copy link
Copy Markdown
Member Author

@clonker clonker May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed it here so it's evident that the implementation follows the one of the equality operator. Where do you think it should live?

Comment thread libevmasm/BlockDeduplicator.h
Copy link
Copy Markdown
Contributor

@blishko blishko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the whole deduplication algorithm seems somewhat fragile to me, I believe the proposed changes preserve the existing behaviour and do not introduce any new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants