Skip to content

NVMe layer: support configuring attr_qid_max on nvmet subsystems #483

@Tydus

Description

@Tydus

Version info:

  • LINSTOR server: 1.33.1
  • LINSTOR client: 1.27.1
  • Kernel: 6.8.12 (Debian 12 / Proxmox VE)
  • Transport: NVMe-oF over RDMA (RoCEv2)

Description:

When LINSTOR creates an nvmet subsystem, attr_qid_max defaults to 128. The nvme-rdma initiator creates one RDMA queue pair per I/O queue, and on hosts with many CPUs it requests all 128 queues. Each QP requires a full RDMA connection setup (rdma_cm resolve + QP creation), causing:

  • Connect: ~32 seconds to establish 128 RDMA QPs
  • Disconnect: 80+ seconds with fabrics command timeouts

Setting attr_qid_max=16 on the nvmet subsystem reduces this to ~5s connect / ~3s disconnect, and actually improves throughput:

128 queues (default) 16 queues
Connect time ~32s ~5s
Disconnect time 80+s (fabrics timeout) ~3s
Read throughput (fio, 16 jobs, iodepth=16) 9.8 GiB/s 19.0 GiB/s

128 RDMA QPs cause contention and halve throughput compared to 16 queues.

Reproduction steps:

  1. Create an NVMe resource group and spawn a resource (target created on host B):
    linstor resource-group create nvme-rg --storage-pool <pool> --layer-list nvme,storage --place-count 1
    linstor volume-group create nvme-rg
    linstor resource-group spawn-resources nvme-rg test-nvme 100G
    
  2. Create an NVMe initiator on host A (a machine with many CPU cores):
    linstor resource create --nvme-initiator <hostA> test-nvme
    
  3. Observe ~30s delay before the command returns. dmesg on host A shows:
    nvme nvme3: creating 128 I/O queues.
    nvme nvme3: mapped 128/0/0 default/read/poll queues.
    
  4. Manually setting attr_qid_max on the target resolves the issue:
    echo 16 > /sys/kernel/config/nvmet/subsystems/<subsystem>/attr_qid_max
    
    Subsequent initiator connections create only 16 queues and complete in ~5 seconds.

Environment details:

The initiator host has 224 CPU cores, so nvme-rdma requests the maximum 128 I/O queues from the target. This is common in HPC/virtualization environments with high core counts.

Suggestion:

  • Target-side: An NVMe/QidMax property (settable on controller, resource-definition, or resource-group) that writes to attr_qid_max after creating the nvmet subsystem.
  • Initiator-side: Pass --nr-io-queues=<N> to nvme connect when creating initiators, controlled by the same or a separate property.
  • A sensible default (e.g., 32) would also help out of the box.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions