Skip to content

GPU prefilter produces incorrect results when masking is enabled #1083

@KimBioInfoStudio

Description

@KimBioInfoStudio

Bug description

When using GPU-accelerated ungapped prefilter with the default masking setting (--mask 1), some valid hits are missed compared to CPU results. Setting --mask 0 produces results identical to CPU.

Root cause

makepaddedseqdb applies tantan soft-masking to sequences before creating the GPU padded database. Soft-masked residues have their encoding value increased by 32 (e.g., A (0) becomes a (32)). In the GPU kernel, these masked values are passed through ClampToInvalid which maps them to the last row of the PSSM (row 20), where the score is 0.

This means any soft-masked position gets a score of 0 in the GPU prefilter, while the CPU prefilter handles masking differently (applying a score penalty rather than zeroing). This discrepancy causes the GPU to miss hits that the CPU finds, especially for sequences with significant low-complexity regions.

Reproduction

# Create test databases
mmseqs createdb query.fasta queryDB
mmseqs createdb target.fasta targetDB

# Create padded DB with masking (default)
mmseqs makepaddedseqdb targetDB targetDB_pad

# GPU search - may miss hits
mmseqs search queryDB targetDB_pad result_gpu tmp --gpu 1

# CPU search (ungapped) - finds all hits
mmseqs search queryDB targetDB result_cpu tmp --prefilter-mode 1

# Workaround: disable masking
mmseqs makepaddedseqdb targetDB targetDB_pad_nomask --mask 0
mmseqs search queryDB targetDB_pad_nomask result_gpu2 tmp --gpu 1
# result_gpu2 matches result_cpu

Observed behavior

  • GPU with --mask 1 (default): missing hits compared to CPU
  • GPU with --mask 0: results identical to CPU

This affects both protein and nucleotide GPU searches (including translated search modes blastx/tblastn/tblastx).

Workaround

Pass --mask 0 when creating the padded database or running GPU searches.

Environment

  • MMseqs2 commit: dca44bc
  • GPU: NVIDIA Blackwell (DGX Spark, compute capability 12.0)
  • CUDA: 12.x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions