Skip to content

Make LIFO slot stealing opt-in#8092

Closed
Darksonn wants to merge 5 commits intotokio-rs:tokio-1.52.xfrom
Darksonn:lifo-stealing-optional
Closed

Make LIFO slot stealing opt-in#8092
Darksonn wants to merge 5 commits intotokio-rs:tokio-1.52.xfrom
Darksonn:lifo-stealing-optional

Conversation

@Darksonn
Copy link
Copy Markdown
Member

We made it possible to steal the LIFO slot in #7431, but this causes a performance regression #8065. Thus, make it opt-in.

The feature is split into two options:

  • Can stealing happen at all?
  • Are workers woken up eagerly to check whether stealing should be carried out?

Both options are needed to make the runtime fully tolerate slow tasks, but I split them up as the first option can probably be enabled by default without adverse performance impact. But for now, both are off by default.

Fixes: #8065

@Darksonn Darksonn requested a review from hawkw April 28, 2026 08:16
@Darksonn Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Apr 28, 2026
@github-actions github-actions Bot added the R-loom-multi-thread Run loom multi-thread tests on this PR label Apr 28, 2026
@Darksonn Darksonn force-pushed the lifo-stealing-optional branch from 4ba387c to bafc878 Compare April 28, 2026 08:19
@Darksonn
Copy link
Copy Markdown
Member Author

There aren't any changes on master that can't be part of a 1.52.2 release, so merging into master is fine.

@github-actions github-actions Bot added the R-loom-time-driver Run loom time driver tests on this PR label Apr 28, 2026
Comment thread tokio/src/runtime/time/wheel/mod.rs
@Darksonn Darksonn force-pushed the lifo-stealing-optional branch from b874b10 to 6add0b7 Compare April 29, 2026 06:01
@github-actions github-actions Bot removed the R-loom-time-driver Run loom time driver tests on this PR label Apr 29, 2026
Comment thread tokio/src/runtime/builder.rs
Comment thread tokio/src/runtime/builder.rs Outdated
Comment thread tokio/src/runtime/builder.rs
Comment thread tokio/src/runtime/scheduler/multi_thread/queue.rs Outdated
Comment thread tokio/src/runtime/scheduler/multi_thread/worker.rs Outdated
/// [`enable_eager_lifo_handoff`] option is also enabled.
///
/// See the docs for [`disable_lifo_slot`] for more details on what the
/// LIFO slot is.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some info how the new option(s) depend on the value of disable_lifo_slot ?!
Currently it is possible to do: builder.disable_lifo_slot().enable_lifo_slot_stealing().

Or even add debug_assert!(!self.disable_lifo_slot, "Enabling LIFO slot stealing does not make sense when LIFO slot is disabled");

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating docs and making these panic.

Comment thread tokio/src/runtime/tests/loom_multi_thread/queue.rs
} else {
// We pushed to the LIFO slot, and it was empty, so only notify if eager lifo
// handoff is enabled.
self.shared.config.enable_eager_lifo_handoff
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.shared.config.enable_eager_lifo_handoff
self.shared.config.enable_lifo_stealing && self.shared.config.enable_eager_lifo_handoff

no need to notify if stealing is not enabled, no ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm changing builder to panic in this scenario, so it's not necessary to check both here.

Comment thread tokio/src/runtime/scheduler/multi_thread/worker/taskdump.rs
Comment thread tokio/src/runtime/config.rs Outdated
Copy link
Copy Markdown
Member

@mattiapitossi mattiapitossi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Copy Markdown
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend against specific options for stealing from LIFO etc.. and have a general "wake up more aggressively" option that is used as a basic toggle for the runtime to decide if it should bias for waking up or not.

@Darksonn
Copy link
Copy Markdown
Member Author

Darksonn commented May 1, 2026

I agree but I don't want to do that in a patch release.

@carllerche
Copy link
Copy Markdown
Member

Feel free to dismiss my review if it doesn't apply here

@Noah-Kennedy
Copy link
Copy Markdown
Contributor

The implementation looks sound, have approved.

I do somewhat question adding options in a patch but I will not block on that.

@Darksonn
Copy link
Copy Markdown
Member Author

Darksonn commented May 4, 2026

I don't feel that strongly about this, so I'm not going to dismiss your review. I filed #8100 to revert instead.

@Darksonn
Copy link
Copy Markdown
Member Author

Darksonn commented May 5, 2026

I assume Eliza will pick this up when re-enabling this, so closing.

@Darksonn Darksonn closed this May 5, 2026
hawkw added a commit that referenced this pull request May 5, 2026
In order to re-enable stealing tasks from the LIFO slot, we would like
to allow the user to configure whether or not pushing tasks to the LIFO
slot unparks another worker thread. This is in order to avoid
performance regressions due to the overhead of additional cross-thread
wakeups (i.e. #8065). In [this comment][1] on #8092, @carllerche
suggested that we do this by having a single option controlling whether
we wake parked workers more or less aggressively which would also
control the eager I/O and time driver handoff behavior from #8010, which
also causes the runtime to wake parked workers more aggressively in
order to prevent potential deadlocks and task starvation.

This PR adds such a knob to the `runtime::Builder` type, which requires
`tokio_unstable`. This is represented as an `UnparkingMode` enum. We use
an enum rather than a `bool` primarily because I would like to add a
third unparking mode in the future. This mode would wake parked workers
much less frequently, but would instead track whether there are any
stealable tasks enqueued, and wake a stealer only when _transitioning_
from 0-1 stealable tasks, and have parking workers park with a timeout
while we are in the "stealable tasks" state. The idea is that we would
prevent deadlocks while avoiding spurious IPIs in this case. But, I
haven't actually implemented this behavior yet, so presently, the enum
has two variants: `Traditional`, which is the current runtime behavior,
and `Cautious`, which wakes more aggressively to try to prevent
deadlocks (and enables waking on LIFO pushes and on I/O driver handoff).
The existing `enable_eager_driver_handoff` is deprecated in favor of the
new API.

I also felt that the enum made it a bit easier to document the different
behaviors. If anyone has input on how we can better explain the two
unparking modes to users, I'd welcome suggestions!

Closes #8118
[1]: #8092 (review)
hawkw added a commit that referenced this pull request May 5, 2026
In order to re-enable stealing tasks from the LIFO slot, we would like
to allow the user to configure whether or not pushing tasks to the LIFO
slot unparks another worker thread. This is in order to avoid
performance regressions due to the overhead of additional cross-thread
wakeups (i.e. #8065). In [this comment][1] on #8092, @carllerche
suggested that we do this by having a single option controlling whether
we wake parked workers more or less aggressively which would also
control the eager I/O and time driver handoff behavior from #8010, which
also causes the runtime to wake parked workers more aggressively in
order to prevent potential deadlocks and task starvation.

This PR adds such a knob to the `runtime::Builder` type, which requires
`tokio_unstable`. This is represented as an `UnparkingMode` enum. We use
an enum rather than a `bool` primarily because I would like to add a
third unparking mode in the future. This mode would wake parked workers
much less frequently, but would instead track whether there are any
stealable tasks enqueued, and wake a stealer only when _transitioning_
from 0-1 stealable tasks, and have parking workers park with a timeout
while we are in the "stealable tasks" state. The idea is that we would
prevent deadlocks while avoiding spurious IPIs in this case. But, I
haven't actually implemented this behavior yet, so presently, the enum
has two variants: `Traditional`, which is the current runtime behavior,
and `Cautious`, which wakes more aggressively to try to prevent
deadlocks (and enables waking on LIFO pushes and on I/O driver handoff).
The existing `enable_eager_driver_handoff` is deprecated in favor of the
new API.

I also felt that the enum made it a bit easier to document the different
behaviors. If anyone has input on how we can better explain the two
unparking modes to users, I'd welcome suggestions!

Closes #8118
[1]: #8092 (review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime R-loom-multi-thread Run loom multi-thread tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance regression on high-QPS short-handler workloads since 1.51.0

5 participants