Fix: surface selectLive() errors eagerly and improve LiveStream thread safety (#138)#139
Open
emmanuel-keller wants to merge 5 commits intomainfrom
Open
Fix: surface selectLive() errors eagerly and improve LiveStream thread safety (#138)#139emmanuel-keller wants to merge 5 commits intomainfrom
emmanuel-keller wants to merge 5 commits intomainfrom
Conversation
Add four new tests to LiveQueryTests covering the scenarios reported in issue #138 (live stream notifications and concurrent close): - liveStreamReceivesCreateNotification: blocks on next(), then CREATEs a record and asserts the notification arrives. - liveStreamReceivesUpdateNotification: same pattern for UPDATE. - closeUnblocksBlockedNext: verifies close() from another thread unblocks a thread blocked on next() without deadlocking. - concurrentNextAndCloseDoesNotCrash: stress test with multiple threads calling next() and close() concurrently.
Two fixes for issues around live query reliability (#138): 1. Make LiveStream.handle volatile so that close() on one thread is visible to next() on another, preventing a stale-handle race (use-after-free). 2. Add a readiness handshake in selectLive(): the background thread now signals via a std::sync::mpsc channel once the live query subscription is established (or fails). selectLive() blocks until this signal arrives, surfacing errors like "table does not exist" immediately instead of deferring them to the first next() call.
Document the threading model, locking protocol, and shutdown sequence across LiveStream.java, Surreal.java, lib.rs, live.rs, and surreal.rs so the concurrency invariants are explicit and reviewable.
Apply 43 auto-fixes and 2 targeted suppressions: - Remove redundant closures (|| std::ptr::null_mut -> std::ptr::null_mut) - Remove needless borrows (&surreal -> surreal) - Remove useless conversions (SurrealError::from(e) where e is already SurrealError, .map(JObject::from) after .l().ok()) - Replace manual match-to-Option with .ok() - Suppress redundant_closure on build_server_exception where the fix breaks JObject lifetime inference - Suppress too_many_arguments on up_record_id_range_value (JNI bridge functions mirror Java signatures)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #138
Investigation
The reported deadlock (lock-ordering issue in the Rust JNI layer) could not be confirmed — both
nextNativeandreleaseNativeacquire locks in the same order (recv_mutexthenrx_mux), and the channel-close mechanism correctly unblocksrecv(). Tests covering the exact reported scenarios (notifications after CREATE/UPDATE,close()unblockingnext(), concurrent access) all pass.However, the investigation uncovered two real issues that likely caused the reported symptoms.
Fixes
1. Surface
selectLive()subscription errors eagerlyselectLive()spawned a background thread and returned immediately, before the live query subscription was established. If the subscription failed (e.g. table does not exist), the error was silently queued in the channel and only surfaced on the firstnext()call — or worse,next()returned empty with no indication of what went wrong.The background thread now signals readiness via a
std::sync::mpscchannel.selectLive()blocks until the subscription is confirmed or an error is returned, so failures are thrown at call site.2. Make
LiveStream.handlevolatilehandlewas a plainlongfield read bynext()and written byclose()without synchronization. Ifclose()ran on one thread (freeing native memory and zeroinghandle),next()on another thread could read a stale non-zero value and pass a dangling pointer tonextNative.Code quality
LiveStream.java,Surreal.java,lib.rs,live.rs, andsurreal.rs.Tests
liveStreamReceivesCreateNotification— blocks onnext(), CREATEs a record, asserts notification arrivesliveStreamReceivesUpdateNotification— same pattern for UPDATEcloseUnblocksBlockedNext—close()from another thread unblocks a blockednext()without deadlockingconcurrentNextAndCloseDoesNotCrash— stress test with multiple threads callingnext()andclose()concurrentlyselectLiveOnNonExistentTableThrowsImmediately— verifies the new eager error behavior