Bug description
When using GPU-accelerated ungapped prefilter with the default masking setting (--mask 1), some valid hits are missed compared to CPU results. Setting --mask 0 produces results identical to CPU.
Root cause
makepaddedseqdb applies tantan soft-masking to sequences before creating the GPU padded database. Soft-masked residues have their encoding value increased by 32 (e.g., A (0) becomes a (32)). In the GPU kernel, these masked values are passed through ClampToInvalid which maps them to the last row of the PSSM (row 20), where the score is 0.
This means any soft-masked position gets a score of 0 in the GPU prefilter, while the CPU prefilter handles masking differently (applying a score penalty rather than zeroing). This discrepancy causes the GPU to miss hits that the CPU finds, especially for sequences with significant low-complexity regions.
Reproduction
# Create test databases
mmseqs createdb query.fasta queryDB
mmseqs createdb target.fasta targetDB
# Create padded DB with masking (default)
mmseqs makepaddedseqdb targetDB targetDB_pad
# GPU search - may miss hits
mmseqs search queryDB targetDB_pad result_gpu tmp --gpu 1
# CPU search (ungapped) - finds all hits
mmseqs search queryDB targetDB result_cpu tmp --prefilter-mode 1
# Workaround: disable masking
mmseqs makepaddedseqdb targetDB targetDB_pad_nomask --mask 0
mmseqs search queryDB targetDB_pad_nomask result_gpu2 tmp --gpu 1
# result_gpu2 matches result_cpu
Observed behavior
- GPU with
--mask 1 (default): missing hits compared to CPU
- GPU with
--mask 0: results identical to CPU
This affects both protein and nucleotide GPU searches (including translated search modes blastx/tblastn/tblastx).
Workaround
Pass --mask 0 when creating the padded database or running GPU searches.
Environment
- MMseqs2 commit: dca44bc
- GPU: NVIDIA Blackwell (DGX Spark, compute capability 12.0)
- CUDA: 12.x
Bug description
When using GPU-accelerated ungapped prefilter with the default masking setting (
--mask 1), some valid hits are missed compared to CPU results. Setting--mask 0produces results identical to CPU.Root cause
makepaddedseqdbapplies tantan soft-masking to sequences before creating the GPU padded database. Soft-masked residues have their encoding value increased by 32 (e.g.,A(0) becomesa(32)). In the GPU kernel, these masked values are passed throughClampToInvalidwhich maps them to the last row of the PSSM (row 20), where the score is 0.This means any soft-masked position gets a score of 0 in the GPU prefilter, while the CPU prefilter handles masking differently (applying a score penalty rather than zeroing). This discrepancy causes the GPU to miss hits that the CPU finds, especially for sequences with significant low-complexity regions.
Reproduction
Observed behavior
--mask 1(default): missing hits compared to CPU--mask 0: results identical to CPUThis affects both protein and nucleotide GPU searches (including translated search modes blastx/tblastn/tblastx).
Workaround
Pass
--mask 0when creating the padded database or running GPU searches.Environment