Add _throw_dmrs device override for reshape of views#3095
Merged
maleadt merged 4 commits intoJuliaGPU:masterfrom Apr 15, 2026
Merged
Add _throw_dmrs device override for reshape of views#3095maleadt merged 4 commits intoJuliaGPU:masterfrom
_throw_dmrs device override for reshape of views#3095maleadt merged 4 commits intoJuliaGPU:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3095 +/- ##
=======================================
Coverage 90.31% 90.32%
=======================================
Files 141 141
Lines 12165 12165
=======================================
+ Hits 10987 10988 +1
+ Misses 1178 1177 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 14712fa | Previous: 5065018 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
101002 ns |
101779 ns |
0.99 |
array/accumulate/Float32/dims=1 |
76209 ns |
76943 ns |
0.99 |
array/accumulate/Float32/dims=1L |
1584594 ns |
1585563 ns |
1.00 |
array/accumulate/Float32/dims=2 |
143217 ns |
144109 ns |
0.99 |
array/accumulate/Float32/dims=2L |
657221.5 ns |
658119 ns |
1.00 |
array/accumulate/Int64/1d |
118204 ns |
118854 ns |
0.99 |
array/accumulate/Int64/dims=1 |
79400.5 ns |
80389 ns |
0.99 |
array/accumulate/Int64/dims=1L |
1693357.5 ns |
1694708 ns |
1.00 |
array/accumulate/Int64/dims=2 |
155741 ns |
156166 ns |
1.00 |
array/accumulate/Int64/dims=2L |
961351 ns |
961594 ns |
1.00 |
array/broadcast |
20395.5 ns |
20512 ns |
0.99 |
array/construct |
1335 ns |
1320.1 ns |
1.01 |
array/copy |
18666.5 ns |
18914 ns |
0.99 |
array/copyto!/cpu_to_gpu |
212677 ns |
214662 ns |
0.99 |
array/copyto!/gpu_to_cpu |
282913 ns |
285022 ns |
0.99 |
array/copyto!/gpu_to_gpu |
11190 ns |
11357 ns |
0.99 |
array/iteration/findall/bool |
132091 ns |
132106 ns |
1.00 |
array/iteration/findall/int |
149004 ns |
148958 ns |
1.00 |
array/iteration/findfirst/bool |
81362.5 ns |
82321 ns |
0.99 |
array/iteration/findfirst/int |
83892 ns |
84247.5 ns |
1.00 |
array/iteration/findmin/1d |
84612 ns |
85453 ns |
0.99 |
array/iteration/findmin/2d |
116960 ns |
117211 ns |
1.00 |
array/iteration/logical |
199494.5 ns |
200370 ns |
1.00 |
array/iteration/scalar |
69041 ns |
69501 ns |
0.99 |
array/permutedims/2d |
52311.5 ns |
52561 ns |
1.00 |
array/permutedims/3d |
52770 ns |
53080 ns |
0.99 |
array/permutedims/4d |
51628 ns |
51880.5 ns |
1.00 |
array/random/rand/Float32 |
13019 ns |
12980 ns |
1.00 |
array/random/rand/Int64 |
37146 ns |
29766 ns |
1.25 |
array/random/rand!/Float32 |
8592.666666666666 ns |
8371 ns |
1.03 |
array/random/rand!/Int64 |
34432 ns |
33996 ns |
1.01 |
array/random/randn/Float32 |
43891 ns |
38221 ns |
1.15 |
array/random/randn!/Float32 |
31481 ns |
31486 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
34717 ns |
35499.5 ns |
0.98 |
array/reductions/mapreduce/Float32/dims=1 |
39448.5 ns |
39750 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
51882 ns |
51923 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56756 ns |
56902 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69674 ns |
70119.5 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
42941.5 ns |
43301 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
43018 ns |
53296 ns |
0.81 |
array/reductions/mapreduce/Int64/dims=1L |
87516 ns |
87758 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59713.5 ns |
60300 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
84918 ns |
85288 ns |
1.00 |
array/reductions/reduce/Float32/1d |
34837.5 ns |
35511 ns |
0.98 |
array/reductions/reduce/Float32/dims=1 |
39636 ns |
49667 ns |
0.80 |
array/reductions/reduce/Float32/dims=1L |
51785 ns |
52100 ns |
0.99 |
array/reductions/reduce/Float32/dims=2 |
56779 ns |
56986 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
69644 ns |
70384 ns |
0.99 |
array/reductions/reduce/Int64/1d |
43093 ns |
43476 ns |
0.99 |
array/reductions/reduce/Int64/dims=1 |
50624.5 ns |
42765.5 ns |
1.18 |
array/reductions/reduce/Int64/dims=1L |
87761 ns |
87783 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
59541 ns |
59896 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
84343 ns |
85038 ns |
0.99 |
array/reverse/1d |
18184 ns |
18408 ns |
0.99 |
array/reverse/1dL |
68791 ns |
69079 ns |
1.00 |
array/reverse/1dL_inplace |
65905 ns |
65864 ns |
1.00 |
array/reverse/1d_inplace |
10261.666666666666 ns |
10281.333333333334 ns |
1.00 |
array/reverse/2d |
20690 ns |
20739 ns |
1.00 |
array/reverse/2dL |
72630 ns |
72716 ns |
1.00 |
array/reverse/2dL_inplace |
65964 ns |
65932 ns |
1.00 |
array/reverse/2d_inplace |
11041 ns |
10105.5 ns |
1.09 |
array/sorting/1d |
2734368 ns |
2734769 ns |
1.00 |
array/sorting/2d |
1067985 ns |
1076074 ns |
0.99 |
array/sorting/by |
3302830 ns |
3327052 ns |
0.99 |
cuda/synchronization/context/auto |
1166.7 ns |
1184.7 ns |
0.98 |
cuda/synchronization/context/blocking |
961.2222222222222 ns |
940.969696969697 ns |
1.02 |
cuda/synchronization/context/nonblocking |
8403 ns |
8469.4 ns |
0.99 |
cuda/synchronization/stream/auto |
998.2 ns |
1021.1 ns |
0.98 |
cuda/synchronization/stream/blocking |
842.3979591836735 ns |
832.7631578947369 ns |
1.01 |
cuda/synchronization/stream/nonblocking |
7281.5 ns |
7582.5 ns |
0.96 |
integration/byval/reference |
143927 ns |
144056 ns |
1.00 |
integration/byval/slices=1 |
145747 ns |
145820 ns |
1.00 |
integration/byval/slices=2 |
284359 ns |
284683 ns |
1.00 |
integration/byval/slices=3 |
423018 ns |
423445 ns |
1.00 |
integration/cudadevrt |
102569 ns |
102525.5 ns |
1.00 |
integration/volumerhs |
23436276.5 ns |
23421325.5 ns |
1.00 |
kernel/indexing |
13354 ns |
13267 ns |
1.01 |
kernel/indexing_checked |
14129 ns |
14039 ns |
1.01 |
kernel/launch |
2199.5555555555557 ns |
2141.4444444444443 ns |
1.03 |
kernel/occupancy |
668.5875 ns |
695.4267515923567 ns |
0.96 |
kernel/rand |
18046 ns |
15508 ns |
1.16 |
latency/import |
3832774972.5 ns |
3810464780.5 ns |
1.01 |
latency/precompile |
4616111922 ns |
4591174243 ns |
1.01 |
latency/ttfp |
4415613716 ns |
4390561943.5 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
Member
|
Please don't continuously merge |
maleadt
approved these changes
Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
reshape(@view(data[1:n*n]), (n, n))fails to compile on the GPU.@viewcreates aSubArray, which has no specializedreshapemethod on the device, so it falls back to Base's generic_reshape. That path calls_throw_dmrs, which tries to construct aDimensionMismatchstring — unsupported on the GPU.