Skip to content

Add _throw_dmrs device override for reshape of views#3095

Merged
maleadt merged 4 commits intoJuliaGPU:masterfrom
Abdelrahman912:add_reshape_view_dispatch
Apr 15, 2026
Merged

Add _throw_dmrs device override for reshape of views#3095
maleadt merged 4 commits intoJuliaGPU:masterfrom
Abdelrahman912:add_reshape_view_dispatch

Conversation

@Abdelrahman912
Copy link
Copy Markdown
Contributor

Problem

reshape(@view(data[1:n*n]), (n, n)) fails to compile on the GPU. @view creates a SubArray, which has no specialized reshape method on the device, so it falls back to Base's generic _reshape. That path calls _throw_dmrs, which tries to construct a DimensionMismatch string — unsupported on the GPU.

Reason: unsupported call to an external C function
Stacktrace:
 [1] _string_n
   @ ./strings/string.jl:109
 [2] StringMemory
   @ ./iobuffer.jl:167
 [3] dec
   @ ./intfuncs.jl:918
 [4] #string#403
   @ ./intfuncs.jl:1000
 [5] multiple call sites
   @ unknown:0
   
function _reshape(parent::AbstractArray, dims::Dims)
  n = length(parent)
  prod(dims) == n || _throw_dmrs(n, "size", dims)
  __reshape((parent, IndexStyle(parent)), dims)
end

@noinline function _throw_dmrs(n, str, dims)
  throw(DimensionMismatch("parent has $n elements, which is incompatible with $str $dims")) ## THIS IS THE CULPRIT
end

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.32%. Comparing base (5065018) to head (14712fa).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3095   +/-   ##
=======================================
  Coverage   90.31%   90.32%           
=======================================
  Files         141      141           
  Lines       12165    12165           
=======================================
+ Hits        10987    10988    +1     
+ Misses       1178     1177    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 14712fa Previous: 5065018 Ratio
array/accumulate/Float32/1d 101002 ns 101779 ns 0.99
array/accumulate/Float32/dims=1 76209 ns 76943 ns 0.99
array/accumulate/Float32/dims=1L 1584594 ns 1585563 ns 1.00
array/accumulate/Float32/dims=2 143217 ns 144109 ns 0.99
array/accumulate/Float32/dims=2L 657221.5 ns 658119 ns 1.00
array/accumulate/Int64/1d 118204 ns 118854 ns 0.99
array/accumulate/Int64/dims=1 79400.5 ns 80389 ns 0.99
array/accumulate/Int64/dims=1L 1693357.5 ns 1694708 ns 1.00
array/accumulate/Int64/dims=2 155741 ns 156166 ns 1.00
array/accumulate/Int64/dims=2L 961351 ns 961594 ns 1.00
array/broadcast 20395.5 ns 20512 ns 0.99
array/construct 1335 ns 1320.1 ns 1.01
array/copy 18666.5 ns 18914 ns 0.99
array/copyto!/cpu_to_gpu 212677 ns 214662 ns 0.99
array/copyto!/gpu_to_cpu 282913 ns 285022 ns 0.99
array/copyto!/gpu_to_gpu 11190 ns 11357 ns 0.99
array/iteration/findall/bool 132091 ns 132106 ns 1.00
array/iteration/findall/int 149004 ns 148958 ns 1.00
array/iteration/findfirst/bool 81362.5 ns 82321 ns 0.99
array/iteration/findfirst/int 83892 ns 84247.5 ns 1.00
array/iteration/findmin/1d 84612 ns 85453 ns 0.99
array/iteration/findmin/2d 116960 ns 117211 ns 1.00
array/iteration/logical 199494.5 ns 200370 ns 1.00
array/iteration/scalar 69041 ns 69501 ns 0.99
array/permutedims/2d 52311.5 ns 52561 ns 1.00
array/permutedims/3d 52770 ns 53080 ns 0.99
array/permutedims/4d 51628 ns 51880.5 ns 1.00
array/random/rand/Float32 13019 ns 12980 ns 1.00
array/random/rand/Int64 37146 ns 29766 ns 1.25
array/random/rand!/Float32 8592.666666666666 ns 8371 ns 1.03
array/random/rand!/Int64 34432 ns 33996 ns 1.01
array/random/randn/Float32 43891 ns 38221 ns 1.15
array/random/randn!/Float32 31481 ns 31486 ns 1.00
array/reductions/mapreduce/Float32/1d 34717 ns 35499.5 ns 0.98
array/reductions/mapreduce/Float32/dims=1 39448.5 ns 39750 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 51882 ns 51923 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56756 ns 56902 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69674 ns 70119.5 ns 0.99
array/reductions/mapreduce/Int64/1d 42941.5 ns 43301 ns 0.99
array/reductions/mapreduce/Int64/dims=1 43018 ns 53296 ns 0.81
array/reductions/mapreduce/Int64/dims=1L 87516 ns 87758 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59713.5 ns 60300 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 84918 ns 85288 ns 1.00
array/reductions/reduce/Float32/1d 34837.5 ns 35511 ns 0.98
array/reductions/reduce/Float32/dims=1 39636 ns 49667 ns 0.80
array/reductions/reduce/Float32/dims=1L 51785 ns 52100 ns 0.99
array/reductions/reduce/Float32/dims=2 56779 ns 56986 ns 1.00
array/reductions/reduce/Float32/dims=2L 69644 ns 70384 ns 0.99
array/reductions/reduce/Int64/1d 43093 ns 43476 ns 0.99
array/reductions/reduce/Int64/dims=1 50624.5 ns 42765.5 ns 1.18
array/reductions/reduce/Int64/dims=1L 87761 ns 87783 ns 1.00
array/reductions/reduce/Int64/dims=2 59541 ns 59896 ns 0.99
array/reductions/reduce/Int64/dims=2L 84343 ns 85038 ns 0.99
array/reverse/1d 18184 ns 18408 ns 0.99
array/reverse/1dL 68791 ns 69079 ns 1.00
array/reverse/1dL_inplace 65905 ns 65864 ns 1.00
array/reverse/1d_inplace 10261.666666666666 ns 10281.333333333334 ns 1.00
array/reverse/2d 20690 ns 20739 ns 1.00
array/reverse/2dL 72630 ns 72716 ns 1.00
array/reverse/2dL_inplace 65964 ns 65932 ns 1.00
array/reverse/2d_inplace 11041 ns 10105.5 ns 1.09
array/sorting/1d 2734368 ns 2734769 ns 1.00
array/sorting/2d 1067985 ns 1076074 ns 0.99
array/sorting/by 3302830 ns 3327052 ns 0.99
cuda/synchronization/context/auto 1166.7 ns 1184.7 ns 0.98
cuda/synchronization/context/blocking 961.2222222222222 ns 940.969696969697 ns 1.02
cuda/synchronization/context/nonblocking 8403 ns 8469.4 ns 0.99
cuda/synchronization/stream/auto 998.2 ns 1021.1 ns 0.98
cuda/synchronization/stream/blocking 842.3979591836735 ns 832.7631578947369 ns 1.01
cuda/synchronization/stream/nonblocking 7281.5 ns 7582.5 ns 0.96
integration/byval/reference 143927 ns 144056 ns 1.00
integration/byval/slices=1 145747 ns 145820 ns 1.00
integration/byval/slices=2 284359 ns 284683 ns 1.00
integration/byval/slices=3 423018 ns 423445 ns 1.00
integration/cudadevrt 102569 ns 102525.5 ns 1.00
integration/volumerhs 23436276.5 ns 23421325.5 ns 1.00
kernel/indexing 13354 ns 13267 ns 1.01
kernel/indexing_checked 14129 ns 14039 ns 1.01
kernel/launch 2199.5555555555557 ns 2141.4444444444443 ns 1.03
kernel/occupancy 668.5875 ns 695.4267515923567 ns 0.96
kernel/rand 18046 ns 15508 ns 1.16
latency/import 3832774972.5 ns 3810464780.5 ns 1.01
latency/precompile 4616111922 ns 4591174243 ns 1.01
latency/ttfp 4415613716 ns 4390561943.5 ns 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Copy Markdown
Member

maleadt commented Apr 15, 2026

Please don't continuously merge master. That's a waste of CI resources.

@maleadt maleadt merged commit c27d64b into JuliaGPU:master Apr 15, 2026
0 of 2 checks passed
@Abdelrahman912 Abdelrahman912 deleted the add_reshape_view_dispatch branch April 15, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants