Skip to content

Fix: surface selectLive() errors eagerly and improve LiveStream thread safety (#138)#139

Open
emmanuel-keller wants to merge 5 commits intomainfrom
emmanuel/fix-live-stream-138
Open

Fix: surface selectLive() errors eagerly and improve LiveStream thread safety (#138)#139
emmanuel-keller wants to merge 5 commits intomainfrom
emmanuel/fix-live-stream-138

Conversation

@emmanuel-keller
Copy link
Copy Markdown
Collaborator

@emmanuel-keller emmanuel-keller commented Apr 15, 2026

Closes #138

Investigation

The reported deadlock (lock-ordering issue in the Rust JNI layer) could not be confirmed — both nextNative and releaseNative acquire locks in the same order (recv_mutex then rx_mux), and the channel-close mechanism correctly unblocks recv(). Tests covering the exact reported scenarios (notifications after CREATE/UPDATE, close() unblocking next(), concurrent access) all pass.

However, the investigation uncovered two real issues that likely caused the reported symptoms.

Fixes

1. Surface selectLive() subscription errors eagerly

selectLive() spawned a background thread and returned immediately, before the live query subscription was established. If the subscription failed (e.g. table does not exist), the error was silently queued in the channel and only surfaced on the first next() call — or worse, next() returned empty with no indication of what went wrong.

The background thread now signals readiness via a std::sync::mpsc channel. selectLive() blocks until the subscription is confirmed or an error is returned, so failures are thrown at call site.

2. Make LiveStream.handle volatile

handle was a plain long field read by next() and written by close() without synchronization. If close() ran on one thread (freeing native memory and zeroing handle), next() on another thread could read a stale non-zero value and pass a dangling pointer to nextNative.

Code quality

  • Documented the threading model, locking protocol, and shutdown sequence across LiveStream.java, Surreal.java, lib.rs, live.rs, and surreal.rs.
  • Resolved all 45 clippy warnings in the Rust native layer (43 auto-fixes + 2 targeted suppressions).

Tests

  • liveStreamReceivesCreateNotification — blocks on next(), CREATEs a record, asserts notification arrives
  • liveStreamReceivesUpdateNotification — same pattern for UPDATE
  • closeUnblocksBlockedNextclose() from another thread unblocks a blocked next() without deadlocking
  • concurrentNextAndCloseDoesNotCrash — stress test with multiple threads calling next() and close() concurrently
  • selectLiveOnNonExistentTableThrowsImmediately — verifies the new eager error behavior

Add four new tests to LiveQueryTests covering the scenarios reported
in issue #138 (live stream notifications and concurrent close):

- liveStreamReceivesCreateNotification: blocks on next(), then CREATEs
  a record and asserts the notification arrives.
- liveStreamReceivesUpdateNotification: same pattern for UPDATE.
- closeUnblocksBlockedNext: verifies close() from another thread
  unblocks a thread blocked on next() without deadlocking.
- concurrentNextAndCloseDoesNotCrash: stress test with multiple
  threads calling next() and close() concurrently.
Two fixes for issues around live query reliability (#138):

1. Make LiveStream.handle volatile so that close() on one thread is
   visible to next() on another, preventing a stale-handle race
   (use-after-free).

2. Add a readiness handshake in selectLive(): the background thread
   now signals via a std::sync::mpsc channel once the live query
   subscription is established (or fails). selectLive() blocks until
   this signal arrives, surfacing errors like "table does not exist"
   immediately instead of deferring them to the first next() call.
@emmanuel-keller emmanuel-keller changed the title Add tests reproducing live query issues from #138 Fix: surface selectLive() errors eagerly and improve LiveStream thread safety (#138) Apr 15, 2026
Document the threading model, locking protocol, and shutdown sequence
across LiveStream.java, Surreal.java, lib.rs, live.rs, and surreal.rs
so the concurrency invariants are explicit and reviewable.
Apply 43 auto-fixes and 2 targeted suppressions:

- Remove redundant closures (|| std::ptr::null_mut -> std::ptr::null_mut)
- Remove needless borrows (&surreal -> surreal)
- Remove useless conversions (SurrealError::from(e) where e is
  already SurrealError, .map(JObject::from) after .l().ok())
- Replace manual match-to-Option with .ok()
- Suppress redundant_closure on build_server_exception where the
  fix breaks JObject lifetime inference
- Suppress too_many_arguments on up_record_id_range_value (JNI
  bridge functions mirror Java signatures)
@emmanuel-keller emmanuel-keller marked this pull request as ready for review April 15, 2026 15:48
@emmanuel-keller emmanuel-keller requested a review from kearfy April 15, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: selectLive().next() can deadlock due to incorrect locking in Rust JNI layer

1 participant