Add tests for fast_div and fast rcp intrinsics#3096
Merged
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: ab56cd0 | Previous: 5065018 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
101208 ns |
101779 ns |
0.99 |
array/accumulate/Float32/dims=1 |
76528 ns |
76943 ns |
0.99 |
array/accumulate/Float32/dims=1L |
1585194 ns |
1585563 ns |
1.00 |
array/accumulate/Float32/dims=2 |
143253 ns |
144109 ns |
0.99 |
array/accumulate/Float32/dims=2L |
657669 ns |
658119 ns |
1.00 |
array/accumulate/Int64/1d |
118532 ns |
118854 ns |
1.00 |
array/accumulate/Int64/dims=1 |
80227 ns |
80389 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1695245 ns |
1694708 ns |
1.00 |
array/accumulate/Int64/dims=2 |
156577 ns |
156166 ns |
1.00 |
array/accumulate/Int64/dims=2L |
962022.5 ns |
961594 ns |
1.00 |
array/broadcast |
20810 ns |
20512 ns |
1.01 |
array/construct |
1313.85 ns |
1320.1 ns |
1.00 |
array/copy |
18968 ns |
18914 ns |
1.00 |
array/copyto!/cpu_to_gpu |
217082 ns |
214662 ns |
1.01 |
array/copyto!/gpu_to_cpu |
283361 ns |
285022 ns |
0.99 |
array/copyto!/gpu_to_gpu |
11529 ns |
11357 ns |
1.02 |
array/iteration/findall/bool |
131938 ns |
132106 ns |
1.00 |
array/iteration/findall/int |
148879 ns |
148958 ns |
1.00 |
array/iteration/findfirst/bool |
81910 ns |
82321 ns |
1.00 |
array/iteration/findfirst/int |
84045.5 ns |
84247.5 ns |
1.00 |
array/iteration/findmin/1d |
84855 ns |
85453 ns |
0.99 |
array/iteration/findmin/2d |
117375 ns |
117211 ns |
1.00 |
array/iteration/logical |
200038 ns |
200370 ns |
1.00 |
array/iteration/scalar |
68049 ns |
69501 ns |
0.98 |
array/permutedims/2d |
52836 ns |
52561 ns |
1.01 |
array/permutedims/3d |
52824.5 ns |
53080 ns |
1.00 |
array/permutedims/4d |
51488 ns |
51880.5 ns |
0.99 |
array/random/rand/Float32 |
13093 ns |
12980 ns |
1.01 |
array/random/rand/Int64 |
31000 ns |
29766 ns |
1.04 |
array/random/rand!/Float32 |
8495.666666666666 ns |
8371 ns |
1.01 |
array/random/rand!/Int64 |
34060 ns |
33996 ns |
1.00 |
array/random/randn/Float32 |
38197 ns |
38221 ns |
1.00 |
array/random/randn!/Float32 |
31407 ns |
31486 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
35181 ns |
35499.5 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1 |
43967.5 ns |
39750 ns |
1.11 |
array/reductions/mapreduce/Float32/dims=1L |
51836 ns |
51923 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
56969.5 ns |
56902 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69383.5 ns |
70119.5 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
43171 ns |
43301 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1 |
42905.5 ns |
53296 ns |
0.81 |
array/reductions/mapreduce/Int64/dims=1L |
87788 ns |
87758 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
59737 ns |
60300 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
85291 ns |
85288 ns |
1.00 |
array/reductions/reduce/Float32/1d |
35370 ns |
35511 ns |
1.00 |
array/reductions/reduce/Float32/dims=1 |
40119 ns |
49667 ns |
0.81 |
array/reductions/reduce/Float32/dims=1L |
51989 ns |
52100 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
57339 ns |
56986 ns |
1.01 |
array/reductions/reduce/Float32/dims=2L |
70274 ns |
70384 ns |
1.00 |
array/reductions/reduce/Int64/1d |
43053 ns |
43476 ns |
0.99 |
array/reductions/reduce/Int64/dims=1 |
50490 ns |
42765.5 ns |
1.18 |
array/reductions/reduce/Int64/dims=1L |
87932 ns |
87783 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
60003 ns |
59896 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
85024.5 ns |
85038 ns |
1.00 |
array/reverse/1d |
18101 ns |
18408 ns |
0.98 |
array/reverse/1dL |
68759 ns |
69079 ns |
1.00 |
array/reverse/1dL_inplace |
65860 ns |
65864 ns |
1.00 |
array/reverse/1d_inplace |
10185.333333333334 ns |
10281.333333333334 ns |
0.99 |
array/reverse/2d |
20573 ns |
20739 ns |
0.99 |
array/reverse/2dL |
72719 ns |
72716 ns |
1.00 |
array/reverse/2dL_inplace |
65820 ns |
65932 ns |
1.00 |
array/reverse/2d_inplace |
10668 ns |
10105.5 ns |
1.06 |
array/sorting/1d |
2735643 ns |
2734769 ns |
1.00 |
array/sorting/2d |
1069201 ns |
1076074 ns |
0.99 |
array/sorting/by |
3304362 ns |
3327052 ns |
0.99 |
cuda/synchronization/context/auto |
1178.9 ns |
1184.7 ns |
1.00 |
cuda/synchronization/context/blocking |
928.8235294117648 ns |
940.969696969697 ns |
0.99 |
cuda/synchronization/context/nonblocking |
7010.2 ns |
8469.4 ns |
0.83 |
cuda/synchronization/stream/auto |
986.5555555555555 ns |
1021.1 ns |
0.97 |
cuda/synchronization/stream/blocking |
780.0833333333334 ns |
832.7631578947369 ns |
0.94 |
cuda/synchronization/stream/nonblocking |
7285.5 ns |
7582.5 ns |
0.96 |
integration/byval/reference |
143932 ns |
144056 ns |
1.00 |
integration/byval/slices=1 |
145654 ns |
145820 ns |
1.00 |
integration/byval/slices=2 |
284592 ns |
284683 ns |
1.00 |
integration/byval/slices=3 |
423021 ns |
423445 ns |
1.00 |
integration/cudadevrt |
102549 ns |
102525.5 ns |
1.00 |
integration/volumerhs |
23440557.5 ns |
23421325.5 ns |
1.00 |
kernel/indexing |
13083 ns |
13267 ns |
0.99 |
kernel/indexing_checked |
13981 ns |
14039 ns |
1.00 |
kernel/launch |
2119.2 ns |
2141.4444444444443 ns |
0.99 |
kernel/occupancy |
665.6100628930818 ns |
695.4267515923567 ns |
0.96 |
kernel/rand |
14345 ns |
15508 ns |
0.93 |
latency/import |
3801168926.5 ns |
3810464780.5 ns |
1.00 |
latency/precompile |
4587956035 ns |
4591174243 ns |
1.00 |
latency/ttfp |
4382456999 ns |
4390561943.5 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Member
|
CI failure related. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3096 +/- ##
==========================================
- Coverage 90.43% 90.42% -0.01%
==========================================
Files 141 141
Lines 12025 12025
==========================================
- Hits 10875 10874 -1
- Misses 1150 1151 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #3077