Open
Conversation
Some Vulkan drivers (observed on Adreno, Qualcomm build 923a446bf8,
driver date 09/05/24) report bufferDeviceAddress support in
VkPhysicalDeviceVulkan12Features but crash with SIGSEGV when
vkGetBufferDeviceAddress is actually called:
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
ggml-org#1 ggml_vk_create_buffer+4084
ggml-org#2 ggml_vk_create_buffer_device+148
ggml-org#3 ggml_backend_vk_buffer_type_alloc_buffer+240
ggml-org#7 whisper_model_load+5996
The crash occurs inside ggml_vk_create_buffer when
device->device.getBufferAddress() is called — the driver-internal
function pointer dereferences null.
After creating the logical device, verify that the function pointer
resolves via vkGetDeviceProcAddr and that a test call returns a
non-zero address. If either check fails, disable
buffer_device_address so all guarded code paths skip BDA.
Some Vulkan drivers (observed on Adreno, Qualcomm build 923a446bf8)
fail to compile compute shaders at runtime, reporting
"Failed to link shaders" and returning ErrorUnknown from
createComputePipeline. Previously this threw a C++ exception that
propagated as an uncaught abort, or the resulting null pipeline was
dispatched causing SIGSEGV:
AdrenoVK-0: Failed to link shaders.
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xe8
ggml-org#1 ggml_vk_dispatch_pipeline<vk_mat_mat_push_constants>+360
ggml-org#2 ggml_vk_mul_mat_q_f16+6616
ggml-org#3 ggml_backend_vk_graph_compute+41780
Three changes:
1. ggml_vk_create_pipeline_func: catch the exception, increment
device->pipeline_failures, clean up the shader module, and return
instead of rethrowing. Also handle null pipeline after creation.
2. ggml_vk_dispatch_pipeline: early-return if the pipeline is null or
not compiled (safety net against dispatching broken pipelines).
3. ggml_backend_vk_device_supports_op: return false for all ops when
pipeline_failures > 0, causing the backend scheduler to route
everything to the CPU backend. The GPU is still used for buffer
allocation but all compute runs on CPU.
Previously, missing storageBuffer16BitAccess threw
std::runtime_error("Unsupported device") which crashed the process on
platforms where C++ exceptions propagate as uncaught aborts (Android).
Some drivers also report the feature bit but don't enumerate
VK_KHR_16bit_storage as a device extension — pushing it into
device_extensions then causes vkCreateDevice to fail with
ErrorExtensionNotPresent (another fatal abort on Android).
Instead of throwing, set pipeline_failures so supports_op returns
false for all ops and the backend scheduler routes everything to CPU.
Only push VK_KHR_16bit_storage when the extension is actually
enumerated.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
whisper.cpp crashes on certain Adreno GPUs (just like this Adreno 642L) so it should switch to CPU to avoid the shader linking issue, among other things.
These patches only aim to fix the crash, not the underlying issue. Running on the CPU is pretty slow, especially on older ARM CPUs from when Snapdragon mobile platform model numbers were still three-digit numbers.
I'm not very good at C++ nor do I have lots of insight into whisper.cpp so any improvements are welcome.
Leaving these here so others can test these patches:
Fixes #2411
Fixes #3035
Fixes #2765
Related: #2415
Related: #3168
Related: ggml-org/llama.cpp#12421
Related: ggml-org/llama.cpp#6395
Related: ggml-org/llama.cpp#13450