Skip to content

enhancement(scheduler): honor QueueOrderFn in preempt action#5142

Merged
volcano-sh-bot merged 4 commits intovolcano-sh:masterfrom
hajnalmt:fix/preempt-queue-order
Apr 30, 2026
Merged

enhancement(scheduler): honor QueueOrderFn in preempt action#5142
volcano-sh-bot merged 4 commits intovolcano-sh:masterfrom
hajnalmt:fix/preempt-queue-order

Conversation

@hajnalmt
Copy link
Copy Markdown
Member

@hajnalmt hajnalmt commented Mar 30, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR makes the preempt action honor QueueOrderFn via util.NewPriorityQueue, aligning preempt queue processing with allocate/reclaim and removing non-deterministic map-order queue traversal.

The PR fixes the underRequest array traversal too to honor queue ordering by switching to a mapped structure with underRequestByQueue. (and not looping trough it at every processed queue, which was a bug I think)

Which issue(s) this PR fixes:

Fixes #5139

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Improve scheduler preemption fairness and determinism by honoring queue order functions when selecting queues for preemption. @Osykov and @hajnalmt 

hajnalmt and others added 2 commits March 30, 2026 12:23
…ption

The "Preemption between Task within Job" loop overwrites
preemptorTasks[job.UID] for every starving job in underRequest, which
is shared across all queues. In multi-queue scenarios this mutates the
same preemptor state used later by the between-jobs preemption phase.

Because queue traversal previously depended on Go map iteration order,
the behavior becomes non-deterministic. If queue Q1 (with no relevant
preemptors) is visited first, its intra-job pass still iterates shared
underRequest entries and can drain/replace Q2's preemptorTasks entry.
When Q2 is visited later, the between-jobs loop sees empty preemptor
state and skips valid preemption, so starvation can persist.

Use a scoped local queue (intraJobPreemptors) for the intra-job pass
instead of reusing preemptorTasks[job.UID]. This preserves the original
preemptorTasks map populated during job discovery for between-jobs
preemption while keeping intra-job behavior isolated.

Add and document a multi-queue regression test that models this cross-
queue interference path and verifies that intra-job processing in one
queue does not invalidate between-jobs preemption in another queue.
The test also captures the prior flaky characteristic (majority pass,
intermittent fail) caused by non-deterministic queue iteration.

Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
Use util.NewPriorityQueue(ssn.QueueOrderFn) in preempt so queue processing
order is deterministic and aligned with allocate/reclaim behavior.

While preserving Osykov's original queue-order implementation intent,
keep the preemptorTasks overwrite fix behavior from the stacked base commit
by using scoped intra-job preemptor queues during intra-job preemption.

Also include the topology-aware multi-queue test coverage from the original
change to validate queue-ordered preemption behavior.

Signed-off-by: Vitalii Osykov <vitaliyosykov@gmail.com>
Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
@volcano-sh-bot volcano-sh-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Mar 30, 2026
@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 30, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug in the preemption action where preemptorTasks were being overwritten during intra-job preemption, which led to lost preemptors in multi-queue scenarios. It also introduces ordered queue processing using a priority queue and adds comprehensive regression tests. A review comment highlights that the intra-job preemption logic is currently redundant because it resides within the queue iteration loop, suggesting it be moved outside to improve performance.

Comment thread pkg/scheduler/actions/preempt/preempt.go
Build underRequest entries per queue and process only the current queue's
starving jobs in the intra-job preemption loop. This avoids cross-queue
iteration in each queue pass and keeps intra-job processing aligned with
the active queue context.

Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
@hajnalmt hajnalmt marked this pull request as ready for review March 30, 2026 12:12
Copilot AI review requested due to automatic review settings March 30, 2026 12:12
@volcano-sh-bot volcano-sh-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the scheduler’s preempt action to process queues deterministically using QueueOrderFn (via util.NewPriorityQueue), aligning behavior with other actions and addressing multi-queue preemption correctness.

Changes:

  • Replace Go map iteration over queues in preempt with a PriorityQueue ordered by ssn.QueueOrderFn.
  • Fix multi-queue intra-job preemption so per-queue processing doesn’t overwrite/drain preemptorTasks for jobs in other queues.
  • Extend preempt action tests with multi-queue regression coverage and add a queue-order-focused topology-aware scenario.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/scheduler/actions/preempt/preempt.go Switch queue traversal to a PriorityQueue (honoring QueueOrderFn) and scope intra-job preemptor queues to avoid cross-queue state corruption.
pkg/scheduler/actions/preempt/preempt_test.go Add regression test for multi-queue preemptor task overwrite and add a topology-aware scenario with queue order enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/scheduler/actions/preempt/preempt_test.go Outdated
Comment thread pkg/scheduler/actions/preempt/preempt.go Outdated
Iterate only relevant starving-job queues when building the preempt queue
priority structure to avoid unnecessary queue scans and extra no-preemptor
iterations. Also fix the topology-aware test case name typo.

Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
@hajnalmt
Copy link
Copy Markdown
Member Author

/area scheduling

@JesseStutler
Copy link
Copy Markdown
Member

/cc

devin-ai-integration Bot added a commit to exa-labs/volcano that referenced this pull request Apr 16, 2026
When a cluster is fully utilized, PredicateNodes() returns an empty list
because no node can fit the preemptor without evictions. This caused both
the preempt and reclaim actions to skip victim selection entirely — the
exact scenario where preemption/reclaim is most needed.

Fix: when predicateNodes is empty, fall back to allNodes so the victim
selection loop still runs on fully-utilized nodes.

Upstream issue: volcano-sh#4447
Upstream fix PRs (unmerged): volcano-sh#4448, volcano-sh#4613, volcano-sh#5142

Co-Authored-By: carlos <carlosmarques.personal@gmail.com>
@JesseStutler
Copy link
Copy Markdown
Member

@hajnalmt Seems that after #5141 merged but no conflicts? I think this pr is worth to also backport

continue
}

if assigned {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking that it's useless codes, why do we need to push back again? It's weird, although it's not related with this pr

@JesseStutler
Copy link
Copy Markdown
Member

/approve
/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2026
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JesseStutler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2026
@JesseStutler
Copy link
Copy Markdown
Member

/cherrypick release-1.14
/cherrypick release-1.13

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: once the present PR merges, I will cherry-pick it on top of release-1.14 in a new PR and assign it to you.

Details

In response to this:

/cherrypick release-1.14
/cherrypick release-1.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot volcano-sh-bot merged commit eed1bd6 into volcano-sh:master Apr 30, 2026
24 checks passed
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: #5142 failed to apply on top of branch "release-1.14":

Applying: fix(scheduler): prevent preemptorTasks overwrite in multi-queue preemption
Using index info to reconstruct a base tree...
M	pkg/scheduler/actions/preempt/preempt.go
M	pkg/scheduler/actions/preempt/preempt_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/scheduler/actions/preempt/preempt_test.go
CONFLICT (content): Merge conflict in pkg/scheduler/actions/preempt/preempt_test.go
Auto-merging pkg/scheduler/actions/preempt/preempt.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 fix(scheduler): prevent preemptorTasks overwrite in multi-queue preemption
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherrypick release-1.14
/cherrypick release-1.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: new issue created for failed cherrypick: #5266

Details

In response to this:

/cherrypick release-1.14
/cherrypick release-1.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: #5142 failed to apply on top of branch "release-1.13":

Applying: fix(scheduler): prevent preemptorTasks overwrite in multi-queue preemption
Using index info to reconstruct a base tree...
M	pkg/scheduler/actions/preempt/preempt.go
M	pkg/scheduler/actions/preempt/preempt_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/scheduler/actions/preempt/preempt_test.go
CONFLICT (content): Merge conflict in pkg/scheduler/actions/preempt/preempt_test.go
Auto-merging pkg/scheduler/actions/preempt/preempt.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 fix(scheduler): prevent preemptorTasks overwrite in multi-queue preemption
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherrypick release-1.14
/cherrypick release-1.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

@JesseStutler: new issue created for failed cherrypick: #5267

Details

In response to this:

/cherrypick release-1.14
/cherrypick release-1.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copilot AI added a commit that referenced this pull request Apr 30, 2026
Cherry-pick of #5142 to release-1.13.

- Replace map-based queue iteration with util.NewPriorityQueue(ssn.QueueOrderFn)
  for deterministic, priority-based queue processing in preemption
- Use underRequestByQueue map instead of flat underRequest slice to scope
  intra-job preemption to the correct queue
- Add "preemption with priority queues" test case
- Add capacity plugin to TestTopologyAwarePreempt for queue ordering

Agent-Logs-Url: https://github.com/volcano-sh/volcano/sessions/5c62a51f-62e0-4cf9-a3d3-f23934f2da93

Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>
Copilot AI added a commit that referenced this pull request Apr 30, 2026
Cherry-pick of #5142 to release-1.14.

- Replace non-deterministic map iteration of queues with priority queue
  using ssn.QueueOrderFn
- Replace shared underRequest slice with per-queue underRequestByQueue map
- Add regression test for multi-queue preemptorTasks overwrite in TestPreempt
- Add "preemption with priority queues" test in TestTopologyAwarePreempt
- Add capacity plugin to TestTopologyAwarePreempt for queue ordering

Agent-Logs-Url: https://github.com/volcano-sh/volcano/sessions/f63fac78-512b-47db-9996-ce7e75c36606

Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>
volcano-sh-bot pushed a commit that referenced this pull request Apr 30, 2026
…action (#5268)

* Initial plan

* enhancement(scheduler): honor QueueOrderFn in preempt action

Cherry-pick of #5142 to release-1.14.

- Replace non-deterministic map iteration of queues with priority queue
  using ssn.QueueOrderFn
- Replace shared underRequest slice with per-queue underRequestByQueue map
- Add regression test for multi-queue preemptorTasks overwrite in TestPreempt
- Add "preemption with priority queues" test in TestTopologyAwarePreempt
- Add capacity plugin to TestTopologyAwarePreempt for queue ordering

Agent-Logs-Url: https://github.com/volcano-sh/volcano/sessions/f63fac78-512b-47db-9996-ce7e75c36606

Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>
Copilot AI added a commit that referenced this pull request Apr 30, 2026
…ption and honor QueueOrderFn

Cherry-pick of PRs #5141 and #5142 onto release-1.12.

Changes:
- Replace shared underRequest slice with per-queue underRequestByQueue map
- Use scoped intraJobPreemptors queue instead of overwriting preemptorTasks[job.UID]
- Use QueueOrderFn via util.NewPriorityQueue for deterministic queue traversal
- Add regression tests for multi-queue preemptorTasks overwrite and priority queue ordering

Agent-Logs-Url: https://github.com/volcano-sh/volcano/sessions/344df7f0-d583-4c2c-b425-2a17d053a9f2

Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/scheduling kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enhancement(scheduler): Honor QueueOrderFn in preempt action for deterministic queue processing

5 participants