Improve evmasm block deduplicator performance#16684
Open
clonker wants to merge 4 commits into
Open
Conversation
2be13ff to
462f621
Compare
blishko
reviewed
May 11, 2026
Comment on lines
+238
to
+250
| /// Hash function compatible with `operator==`. Found via ADL by `boost::hash`. | ||
| friend std::size_t hash_value(AssemblyItem const& _item) | ||
| { | ||
| std::size_t hash = 0; | ||
| boost::hash_combine(hash, static_cast<int>(_item.m_type)); | ||
| if (_item.m_type == Operation) | ||
| boost::hash_combine(hash, static_cast<int>(_item.instruction())); | ||
| else if (_item.m_type == VerbatimBytecode) | ||
| boost::hash_combine(hash, *_item.m_verbatimBytecode); | ||
| else | ||
| boost::hash_combine(hash, _item.data()); | ||
| return hash; | ||
| } |
Contributor
There was a problem hiding this comment.
This looks OK, I would just move it somewhere else.
Placing it between operator!= and operator< does not seem like the best choice.
Member
Author
There was a problem hiding this comment.
I placed it here so it's evident that the implementation follows the one of the equality operator. Where do you think it should live?
blishko
approved these changes
May 11, 2026
Contributor
blishko
left a comment
There was a problem hiding this comment.
While the whole deduplication algorithm seems somewhat fragile to me, I believe the proposed changes preserve the existing behaviour and do not introduce any new issues.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is mainly a refactor of the
BlockDeduplicatorand no behavior change is intended.Right now the deduplicator uses a
std::setwith a lexicographical compare over each block's body. Looking up a block is thereforeO(block_size * log N)per probe and each comparison walks both blocks assembly item by assembly item (worst case the entire block).The refactor exchanges the
setwith anunordered_setprovided a specific hash function and a specific equality function. Each block's body is hashed once viahash_rangeand equality checks useranges::equalover the same range. Lookup is thenO(block_size)amortized.To achieve this,
AssemblyItemgets ahash_valuefriend so that boost hash can load a hash via ADLBlockIteratorgets default ctor and post-increment to satisfy the range concept (needed forhash_rangeandranges::equal).Some benchmarks on my machine: