Skip to content

Fix use-after-free crash when close races with ping or query#1433

Closed
bquorning wants to merge 4 commits intobrianmario:masterfrom
bquorning:fix-concurrent-close-crash
Closed

Fix use-after-free crash when close races with ping or query#1433
bquorning wants to merge 4 commits intobrianmario:masterfrom
bquorning:fix-concurrent-close-crash

Conversation

@bquorning
Copy link
Copy Markdown

@bquorning bquorning commented Apr 21, 2026

Summary

Fix a use-after-free crash when close is called from one thread while another thread is executing ping, query, or another operation without the GVL. Instead of calling mysql_close immediately (which frees the MYSQL struct while another thread is still using it), close now detects whether an operation is in-flight and takes one of two paths:

  • Idle (no in-flight operation): call mysql_close directly while holding the GVL for a clean COM_QUIT disconnect.
  • Busy (operation in-flight): set a closed flag and call shutdown(SHUT_RDWR) on the socket to interrupt the blocked I/O, then defer mysql_close to decr_mysql2_client (refcount == 0) where no concurrent access is possible.

Reproducer:

client = Mysql2::Client.new(host: "127.0.0.1", port: 3306, username: "root")
Thread.new { client.ping }
client.close

This crashes with a segfault because both nogvl_close and nogvl_ping release Ruby's GVL via rb_thread_call_without_gvl and then call libmysql functions concurrently on the same MYSQL * connection. mysql_close frees internal state while mysql_ping is still accessing it.

Why not a native mutex

A native mutex was considered but intentionally avoided: if a thread holds such a mutex at fork() time, the child inherits a permanently-locked mutex and any subsequent close or GC deadlocks. The flag + shutdown() approach avoids this problem entirely.

A Ruby-level Mutex was also discussed and dismissed in #1284 by @casperisfine, with valid arguments about method dispatch overhead, Thread.handle_interrupt handling, and behavioral changes.

Why the connection pool cannot prevent this

The pool serializes starting operations at the Ruby level, but once two threads have each passed the Ruby-level checks (one doing ping, the other doing close), they are inside rb_thread_call_without_gvl concurrently. The pool has no visibility into that window.

Changes

  • Split close into idle/busy paths as described above
  • Set active_fiber around ping's nogvl call so close can detect in-flight pings
  • Check wrapper->closed in REQUIRE_CONNECTED, closed?, and ping since mysql_close is now deferred and CONNECTED() remains true after close
  • Add a doc comment to client.c explaining the three thread safety mechanisms (active_fiber, closed flag + shutdown, refcount)
  • Add tests for concurrent close+ping, close+query, and fork safety

Related issues

Test plan

  • Added specs for concurrent ping+close, query+close, and fork safety
  • Existing test suite passes
  • rake spec:valgrind passes

🤖 Generated with Claude Code

bquorning and others added 2 commits April 21, 2026 10:11
Calling `client.close` from one thread while another thread is
executing `client.ping` (or any other MySQL operation) can crash
the process with a segfault. Both operations release Ruby's GVL
via `rb_thread_call_without_gvl` and then call libmysql functions
(`mysql_close` and `mysql_ping`) concurrently on the same `MYSQL *`
connection without synchronization. `mysql_close` frees internal
state while the other operation is still accessing it.

The existing `active_fiber` mechanism does not prevent this because:
- `ping` never sets `active_fiber`
- `close` never checks `active_fiber`
- `active_fiber` is a Ruby-level (GVL-held) check, not a native
  thread synchronization primitive

Add an `rb_nativethread_lock_t` (Ruby's cross-platform mutex) to
`mysql_client_wrapper` that serializes all `nogvl_*` functions which
access the underlying MYSQL connection: `nogvl_close`, `nogvl_ping`,
`nogvl_send_query`, `nogvl_read_query_result`, `nogvl_do_result`,
and `nogvl_select_db`.

Each protected function acquires the mutex, checks `wrapper->closed`
to bail out early if `close` already ran, performs the MySQL
operation, and releases the mutex. This ensures `mysql_close` never
runs concurrently with any other MySQL operation on the same
connection.

Behavioral change: `close` now blocks until any in-flight MySQL
operation completes, rather than calling `mysql_close` immediately.
This is necessary to prevent the use-after-free, and only affects
the rare case where `close` races with another operation.

Performance impact is negligible. An uncontended mutex lock+unlock
pair costs ~11 ns, while the MySQL operations it wraps cost
~224 µs (ping) to ~249 µs (SELECT 1) — an overhead of ~0.005%.

Reproducer:
  client = Mysql2::Client.new(host: "127.0.0.1", port: 33061, username: "root")
  Thread.new { client.ping }
  client.close

Co-authored-by: Nony Dutton <nony.dutton@zendesk.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a comment at the top of client.c explaining the three
concurrency mechanisms (active_fiber, mutex, refcount), what each
one guards against, and why they are not interchangeable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bquorning bquorning marked this pull request as ready for review April 21, 2026 14:41
it "should not crash when ping and close are called concurrently" do
100.times do
client = new_client
thread = Thread.new { client.ping }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we do client.query("SELECT SLEEP(1)") then we can get it to consistently fail without worrying about the flakiness of ping executing before client.close does. That way we can probably remove the 100.times do.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, the block runs 100 times slightly faster than 1 second so SELECT SLEEP(1) would be moderately slower.

@sodabrew
Copy link
Copy Markdown
Collaborator

This is great, thank you! Kicking off the CI run

@byroot
Copy link
Copy Markdown
Contributor

byroot commented Apr 22, 2026

Currently at a conference so can't properly review the PR before a few days, but just based on the description I'd advise against using a native mutex, as it if the thread owning it dies (typically on fork) it could result in a deadlock of the VM.

I suspect there is a simpler fix for this race.

@byroot
Copy link
Copy Markdown
Contributor

byroot commented Apr 22, 2026

So I got a few minutes to look, and yes, I believe the proper fix is to extend the use of rb_mysql_client_set_active_fiber to ping and close (and perhaps others?).

@bquorning
Copy link
Copy Markdown
Author

I’ll try that @byroot. Thanks for the review.

@bquorning
Copy link
Copy Markdown
Author

FYI @sodabrew CI seems to be failing because of expired signatures. I am not sure how to update them, but I suspect CI will continue to fail until that has been fixed.

bquorning and others added 2 commits April 22, 2026 12:30
When one thread calls close while another is blocked in a query or ping
(without the GVL), mysql_close frees the MYSQL struct that the in-flight
operation is still reading, causing a use-after-free crash.

Fix this by splitting close into two paths based on active_fiber:

- Idle (no in-flight operation): call mysql_close directly while holding
  the GVL for a clean COM_QUIT disconnect.
- Busy (operation in-flight): set a closed flag and call
  shutdown(SHUT_RDWR) on the socket to interrupt the blocked I/O, then
  defer mysql_close to decr_mysql2_client (refcount == 0) where no
  concurrent access is possible.

A native mutex is intentionally avoided: if a thread holds such a mutex
at fork() time, the child inherits a permanently-locked mutex and any
subsequent close or GC deadlocks.

Also:
- Set active_fiber around ping's nogvl call so close can detect
  in-flight pings
- Check wrapper->closed in REQUIRE_CONNECTED, closed?, and ping since
  mysql_close is now deferred and CONNECTED() remains true after close
- Add a doc comment to client.c explaining the three thread safety
  mechanisms (active_fiber, closed flag + shutdown, refcount)
- Add tests for concurrent close+ping, close+query, and fork safety

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
end
end

it "should not deadlock when forking while a query is in-flight" do
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec may be too heavy-handed to keep around. Let me know if I should delete it before merge.

@bquorning
Copy link
Copy Markdown
Author

I believe the proper fix is to extend the use of rb_mysql_client_set_active_fiber to ping and close (and perhaps others?).

I am not very comfortable in C and relying a lot on Claude Code (and you) to make the right decisions here 😅 What I’m getting back is that

  • in ping we want to return true or false, but calling rb_mysql_client_set_active_fiber will raise an exception if the connection is already in use.
  • And in the close method we don’t want to claim the connection. We want to check if anybody else is using it.

Instead we are adding a NIL_P(wrapper->active_fiber) check, which is identical to other checks already in the code.

@byroot
Copy link
Copy Markdown
Contributor

byroot commented Apr 25, 2026

  • n ping we want to return true or false, but calling rb_mysql_client_set_active_fiber will raise an exception if the connection is already in use.

I believe it's fine, it's already the design of mysql2 that concurrent usage raises. This should never happen, if it does the caller is doing something wrong.

And in the close method we don’t want to claim the connection. We want to check if anybody else is using it.

Yes, but we also don't want anyone starting using it while we're closing it, which is achieved by setting the active fiber.

The active_fiber is effectively a cheaper mutex, that raise when contented.

After glancing at your PR, one problem remaining is that you must either ensure that the APIs you are calling between setting and unsetting the current fiber will never raise, or use rb_protect (or other API to handle exceptions)

Comment thread ext/mysql2/client.c
* raising. close intentionally skips this check — it must work
* regardless of connection state.
*
* 2. closed flag + socket shutdown (C level)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. We can rely on initialized = 0, Mysql2::Client can reconnect.

@byroot
Copy link
Copy Markdown
Contributor

byroot commented Apr 25, 2026

Simpler fix in #1436

@bquorning
Copy link
Copy Markdown
Author

Simpler fix in #1436

That looks so much better than my approach. I’ll close this PR.

@bquorning bquorning closed this Apr 25, 2026
@bquorning bquorning deleted the fix-concurrent-close-crash branch April 25, 2026 09:51
@bquorning
Copy link
Copy Markdown
Author

FYI @sodabrew CI seems to be failing because of expired signatures. I am not sure how to update them, but I suspect CI will continue to fail until that has been fixed.

Oh, a fix for the failing signatures on CI was suggested in #1424 back in February.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants