Skip to content

Performance anomaly observed in hybrid KEM operations with oqs-provider #729

@ChanKamyung

Description

@ChanKamyung

Environment Setup:

  • OpenSSL 3.4.3
  • liboqs 0.15.0
  • oqs-provider 0.11.0-rc1
  • CPU: Intel Core i7-1165G7 CPU
  • Operating System: Ubuntu 24.04.3 LTS (Virtual Machine)

Test Results:

$ openssl speed mlkem768
Doing mlkem768 keygen ops for 10s: 722163 mlkem768 KEM keygen ops in 9.83s
Doing mlkem768 encaps ops for 10s: 805409 mlkem768 KEM encaps ops in 9.81s
Doing mlkem768 decaps ops for 10s: 653917 mlkem768 KEM decaps ops in 9.86s
                               keygen    encaps    decaps keygens/s  encaps/s  decaps/s
                   mlkem768 0.000014s 0.000012s 0.000015s   73465.2   82100.8   66320.2

$ openssl speed X25519
Doing X25519 keygen ops for 10s: 204817 X25519 KEM keygen ops in 9.84s
Doing X25519 encaps ops for 10s: 89317 X25519 KEM encaps ops in 9.87s
Doing X25519 decaps ops for 10s: 180618 X25519 KEM decaps ops in 9.85s
                               keygen    encaps    decaps keygens/s  encaps/s  decaps/s
                     X25519 0.000048s 0.000111s 0.000055s   20814.7    9049.3   18336.9

$ openssl speed X25519MLKEM768
Doing X25519MLKEM768 keygen ops for 10s: 149695 X25519MLKEM768 KEM keygen ops in 9.80s
Doing X25519MLKEM768 encaps ops for 10s: 87746 X25519MLKEM768 KEM encaps ops in 9.88s
Doing X25519MLKEM768 decaps ops for 10s: 85120 X25519MLKEM768 KEM decaps ops in 9.92s
                               keygen    encaps    decaps keygens/s  encaps/s  decaps/s
             X25519MLKEM768 0.000065s 0.000113s 0.000117s   15275.0    8881.2    8580.6

Performance Analysis:

  1. Key Generation: X25519MLKEM768 keygen time (0.000065s) ≈ X25519 keygen (0.000048s) + mlkem768 keygen (0.000014s) = 0.000062s ✓

  2. Encapsulation: X25519MLKEM768 encaps time (0.000113s) ≈ X25519 encaps (0.000111s) + mlkem768 encaps (0.000012s) = 0.000123s ✓

  3. Decapsulation: X25519MLKEM768 decaps time (0.000117s) ≈ 1.67 × (X25519 decaps (0.000055s) + mlkem768 decaps (0.000015s)) = 1.67 × 0.000070s = 0.0001169s ✗

Expected vs Actual Decapsulation Performance:

  • Expected: ~0.000070s (sum of individual operations)
  • Actual: 0.000117s (~1.67× slower than expected)

Contrasting Results with OpenSSL 3.6.0:
When testing the same algorithms using OpenSSL 3.6.0 (which includes native post-quantum algorithm support), the hybrid KEM performance aligns with expectations - all three operations show execution times approximately equal to the sum of their individual components.

$ openssl speed -seconds 3 ML-KEM-768
Doing ML-KEM-768 keygen ops for 3s: 48683 ML-KEM-768 KEM keygen ops in 2.85s
Doing ML-KEM-768 encaps ops for 3s: 85463 ML-KEM-768 KEM encaps ops in 2.95s
Doing ML-KEM-768 decaps ops for 3s: 52684 ML-KEM-768 KEM decaps ops in 2.96s
                               keygen    encaps    decaps keygens/s  encaps/s  decaps/s
                   ML-KEM-768 0.000059s 0.000035s 0.000056s   17081.8   28970.5   17798.6

$ openssl speed -seconds 3 X25519
Doing X25519 keygen ops for 3s: 60690 X25519 KEM keygen ops in 2.94s
Doing X25519 encaps ops for 3s: 26533 X25519 KEM encaps ops in 2.96s
Doing X25519 decaps ops for 3s: 53032 X25519 KEM decaps ops in 2.98s
                               keygen    encaps    decaps keygens/s  encaps/s  decaps/s
                     X25519 0.000048s 0.000112s 0.000056s   20642.9    8963.9   17796.0

$ openssl speed -seconds 3 X25519MLKEM768
Doing X25519MLKEM768 keygen ops for 3s: 26448 X25519MLKEM768 KEM keygen ops in 2.92s
Doing X25519MLKEM768 encaps ops for 3s: 21533 X25519MLKEM768 KEM encaps ops in 2.94s
Doing X25519MLKEM768 decaps ops for 3s: 26301 X25519MLKEM768 KEM decaps ops in 2.86s
                               keygen    encaps    decaps keygens/s  encaps/s  decaps/s
             X25519MLKEM768 0.000110s 0.000137s 0.000109s   9057.5    7324.1    9196.2

Question:
The observed discrepancy in decapsulation performance with oqs-provider seems counterintuitive. While some overhead for hybrid operations is expected, the decapsulation time being nearly double the sum of individual operations warrants investigation.

Could this be related to:

  1. Implementation inefficiencies in the hybrid decapsulation code path?
  2. Additional context setup or key parsing overhead specific to oqs-provider?
  3. Known optimizations or configuration options that might improve performance?

Any insights or guidance would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionNo code change required

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions