Make LIFO slot stealing opt-in#8092
Conversation
4ba387c to
bafc878
Compare
|
There aren't any changes on master that can't be part of a 1.52.2 release, so merging into master is fine. |
b874b10 to
6add0b7
Compare
| /// [`enable_eager_lifo_handoff`] option is also enabled. | ||
| /// | ||
| /// See the docs for [`disable_lifo_slot`] for more details on what the | ||
| /// LIFO slot is. |
There was a problem hiding this comment.
Add some info how the new option(s) depend on the value of disable_lifo_slot ?!
Currently it is possible to do: builder.disable_lifo_slot().enable_lifo_slot_stealing().
Or even add debug_assert!(!self.disable_lifo_slot, "Enabling LIFO slot stealing does not make sense when LIFO slot is disabled");
There was a problem hiding this comment.
Updating docs and making these panic.
| } else { | ||
| // We pushed to the LIFO slot, and it was empty, so only notify if eager lifo | ||
| // handoff is enabled. | ||
| self.shared.config.enable_eager_lifo_handoff |
There was a problem hiding this comment.
| self.shared.config.enable_eager_lifo_handoff | |
| self.shared.config.enable_lifo_stealing && self.shared.config.enable_eager_lifo_handoff |
no need to notify if stealing is not enabled, no ?
There was a problem hiding this comment.
I'm changing builder to panic in this scenario, so it's not necessary to check both here.
carllerche
left a comment
There was a problem hiding this comment.
I would recommend against specific options for stealing from LIFO etc.. and have a general "wake up more aggressively" option that is used as a basic toggle for the runtime to decide if it should bias for waking up or not.
|
I agree but I don't want to do that in a patch release. |
|
Feel free to dismiss my review if it doesn't apply here |
|
The implementation looks sound, have approved. I do somewhat question adding options in a patch but I will not block on that. |
|
I don't feel that strongly about this, so I'm not going to dismiss your review. I filed #8100 to revert instead. |
|
I assume Eliza will pick this up when re-enabling this, so closing. |
In order to re-enable stealing tasks from the LIFO slot, we would like to allow the user to configure whether or not pushing tasks to the LIFO slot unparks another worker thread. This is in order to avoid performance regressions due to the overhead of additional cross-thread wakeups (i.e. #8065). In [this comment][1] on #8092, @carllerche suggested that we do this by having a single option controlling whether we wake parked workers more or less aggressively which would also control the eager I/O and time driver handoff behavior from #8010, which also causes the runtime to wake parked workers more aggressively in order to prevent potential deadlocks and task starvation. This PR adds such a knob to the `runtime::Builder` type, which requires `tokio_unstable`. This is represented as an `UnparkingMode` enum. We use an enum rather than a `bool` primarily because I would like to add a third unparking mode in the future. This mode would wake parked workers much less frequently, but would instead track whether there are any stealable tasks enqueued, and wake a stealer only when _transitioning_ from 0-1 stealable tasks, and have parking workers park with a timeout while we are in the "stealable tasks" state. The idea is that we would prevent deadlocks while avoiding spurious IPIs in this case. But, I haven't actually implemented this behavior yet, so presently, the enum has two variants: `Traditional`, which is the current runtime behavior, and `Cautious`, which wakes more aggressively to try to prevent deadlocks (and enables waking on LIFO pushes and on I/O driver handoff). The existing `enable_eager_driver_handoff` is deprecated in favor of the new API. I also felt that the enum made it a bit easier to document the different behaviors. If anyone has input on how we can better explain the two unparking modes to users, I'd welcome suggestions! Closes #8118 [1]: #8092 (review)
In order to re-enable stealing tasks from the LIFO slot, we would like to allow the user to configure whether or not pushing tasks to the LIFO slot unparks another worker thread. This is in order to avoid performance regressions due to the overhead of additional cross-thread wakeups (i.e. #8065). In [this comment][1] on #8092, @carllerche suggested that we do this by having a single option controlling whether we wake parked workers more or less aggressively which would also control the eager I/O and time driver handoff behavior from #8010, which also causes the runtime to wake parked workers more aggressively in order to prevent potential deadlocks and task starvation. This PR adds such a knob to the `runtime::Builder` type, which requires `tokio_unstable`. This is represented as an `UnparkingMode` enum. We use an enum rather than a `bool` primarily because I would like to add a third unparking mode in the future. This mode would wake parked workers much less frequently, but would instead track whether there are any stealable tasks enqueued, and wake a stealer only when _transitioning_ from 0-1 stealable tasks, and have parking workers park with a timeout while we are in the "stealable tasks" state. The idea is that we would prevent deadlocks while avoiding spurious IPIs in this case. But, I haven't actually implemented this behavior yet, so presently, the enum has two variants: `Traditional`, which is the current runtime behavior, and `Cautious`, which wakes more aggressively to try to prevent deadlocks (and enables waking on LIFO pushes and on I/O driver handoff). The existing `enable_eager_driver_handoff` is deprecated in favor of the new API. I also felt that the enum made it a bit easier to document the different behaviors. If anyone has input on how we can better explain the two unparking modes to users, I'd welcome suggestions! Closes #8118 [1]: #8092 (review)
We made it possible to steal the LIFO slot in #7431, but this causes a performance regression #8065. Thus, make it opt-in.
The feature is split into two options:
Both options are needed to make the runtime fully tolerate slow tasks, but I split them up as the first option can probably be enabled by default without adverse performance impact. But for now, both are off by default.
Fixes: #8065