enhancement(scheduler): honor QueueOrderFn in preempt action#5142
enhancement(scheduler): honor QueueOrderFn in preempt action#5142volcano-sh-bot merged 4 commits intovolcano-sh:masterfrom
Conversation
…ption The "Preemption between Task within Job" loop overwrites preemptorTasks[job.UID] for every starving job in underRequest, which is shared across all queues. In multi-queue scenarios this mutates the same preemptor state used later by the between-jobs preemption phase. Because queue traversal previously depended on Go map iteration order, the behavior becomes non-deterministic. If queue Q1 (with no relevant preemptors) is visited first, its intra-job pass still iterates shared underRequest entries and can drain/replace Q2's preemptorTasks entry. When Q2 is visited later, the between-jobs loop sees empty preemptor state and skips valid preemption, so starvation can persist. Use a scoped local queue (intraJobPreemptors) for the intra-job pass instead of reusing preemptorTasks[job.UID]. This preserves the original preemptorTasks map populated during job discovery for between-jobs preemption while keeping intra-job behavior isolated. Add and document a multi-queue regression test that models this cross- queue interference path and verifies that intra-job processing in one queue does not invalidate between-jobs preemption in another queue. The test also captures the prior flaky characteristic (majority pass, intermittent fail) caused by non-deterministic queue iteration. Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
Use util.NewPriorityQueue(ssn.QueueOrderFn) in preempt so queue processing order is deterministic and aligned with allocate/reclaim behavior. While preserving Osykov's original queue-order implementation intent, keep the preemptorTasks overwrite fix behavior from the stacked base commit by using scoped intra-job preemptor queues during intra-job preemption. Also include the topology-aware multi-queue test coverage from the original change to validate queue-ordered preemption behavior. Signed-off-by: Vitalii Osykov <vitaliyosykov@gmail.com> Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request fixes a bug in the preemption action where preemptorTasks were being overwritten during intra-job preemption, which led to lost preemptors in multi-queue scenarios. It also introduces ordered queue processing using a priority queue and adds comprehensive regression tests. A review comment highlights that the intra-job preemption logic is currently redundant because it resides within the queue iteration loop, suggesting it be moved outside to improve performance.
Build underRequest entries per queue and process only the current queue's starving jobs in the intra-job preemption loop. This avoids cross-queue iteration in each queue pass and keeps intra-job processing aligned with the active queue context. Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
There was a problem hiding this comment.
Pull request overview
Updates the scheduler’s preempt action to process queues deterministically using QueueOrderFn (via util.NewPriorityQueue), aligning behavior with other actions and addressing multi-queue preemption correctness.
Changes:
- Replace Go map iteration over queues in preempt with a
PriorityQueueordered byssn.QueueOrderFn. - Fix multi-queue intra-job preemption so per-queue processing doesn’t overwrite/drain
preemptorTasksfor jobs in other queues. - Extend preempt action tests with multi-queue regression coverage and add a queue-order-focused topology-aware scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pkg/scheduler/actions/preempt/preempt.go |
Switch queue traversal to a PriorityQueue (honoring QueueOrderFn) and scope intra-job preemptor queues to avoid cross-queue state corruption. |
pkg/scheduler/actions/preempt/preempt_test.go |
Add regression test for multi-queue preemptor task overwrite and add a topology-aware scenario with queue order enabled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Iterate only relevant starving-job queues when building the preempt queue priority structure to avoid unnecessary queue scans and extra no-preemptor iterations. Also fix the topology-aware test case name typo. Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
|
/area scheduling |
|
/cc |
When a cluster is fully utilized, PredicateNodes() returns an empty list because no node can fit the preemptor without evictions. This caused both the preempt and reclaim actions to skip victim selection entirely — the exact scenario where preemption/reclaim is most needed. Fix: when predicateNodes is empty, fall back to allNodes so the victim selection loop still runs on fully-utilized nodes. Upstream issue: volcano-sh#4447 Upstream fix PRs (unmerged): volcano-sh#4448, volcano-sh#4613, volcano-sh#5142 Co-Authored-By: carlos <carlosmarques.personal@gmail.com>
| continue | ||
| } | ||
|
|
||
| if assigned { |
There was a problem hiding this comment.
I'm also thinking that it's useless codes, why do we need to push back again? It's weird, although it's not related with this pr
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JesseStutler The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherrypick release-1.14 |
|
@JesseStutler: once the present PR merges, I will cherry-pick it on top of release-1.14 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@JesseStutler: #5142 failed to apply on top of branch "release-1.14": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@JesseStutler: new issue created for failed cherrypick: #5266 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@JesseStutler: #5142 failed to apply on top of branch "release-1.13": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@JesseStutler: new issue created for failed cherrypick: #5267 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Cherry-pick of #5142 to release-1.13. - Replace map-based queue iteration with util.NewPriorityQueue(ssn.QueueOrderFn) for deterministic, priority-based queue processing in preemption - Use underRequestByQueue map instead of flat underRequest slice to scope intra-job preemption to the correct queue - Add "preemption with priority queues" test case - Add capacity plugin to TestTopologyAwarePreempt for queue ordering Agent-Logs-Url: https://github.com/volcano-sh/volcano/sessions/5c62a51f-62e0-4cf9-a3d3-f23934f2da93 Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>
Cherry-pick of #5142 to release-1.14. - Replace non-deterministic map iteration of queues with priority queue using ssn.QueueOrderFn - Replace shared underRequest slice with per-queue underRequestByQueue map - Add regression test for multi-queue preemptorTasks overwrite in TestPreempt - Add "preemption with priority queues" test in TestTopologyAwarePreempt - Add capacity plugin to TestTopologyAwarePreempt for queue ordering Agent-Logs-Url: https://github.com/volcano-sh/volcano/sessions/f63fac78-512b-47db-9996-ce7e75c36606 Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>
…action (#5268) * Initial plan * enhancement(scheduler): honor QueueOrderFn in preempt action Cherry-pick of #5142 to release-1.14. - Replace non-deterministic map iteration of queues with priority queue using ssn.QueueOrderFn - Replace shared underRequest slice with per-queue underRequestByQueue map - Add regression test for multi-queue preemptorTasks overwrite in TestPreempt - Add "preemption with priority queues" test in TestTopologyAwarePreempt - Add capacity plugin to TestTopologyAwarePreempt for queue ordering Agent-Logs-Url: https://github.com/volcano-sh/volcano/sessions/f63fac78-512b-47db-9996-ce7e75c36606 Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>
…ption and honor QueueOrderFn Cherry-pick of PRs #5141 and #5142 onto release-1.12. Changes: - Replace shared underRequest slice with per-queue underRequestByQueue map - Use scoped intraJobPreemptors queue instead of overwriting preemptorTasks[job.UID] - Use QueueOrderFn via util.NewPriorityQueue for deterministic queue traversal - Add regression tests for multi-queue preemptorTasks overwrite and priority queue ordering Agent-Logs-Url: https://github.com/volcano-sh/volcano/sessions/344df7f0-d583-4c2c-b425-2a17d053a9f2 Co-authored-by: JesseStutler <38534065+JesseStutler@users.noreply.github.com>
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR makes the preempt action honor
QueueOrderFnviautil.NewPriorityQueue, aligning preempt queue processing with allocate/reclaim and removing non-deterministic map-order queue traversal.The PR fixes the
underRequestarray traversal too to honor queue ordering by switching to a mapped structure withunderRequestByQueue. (and not looping trough it at every processed queue, which was a bug I think)Which issue(s) this PR fixes:
Fixes #5139
Special notes for your reviewer:
Does this PR introduce a user-facing change?