Skip to content

Add tests for fast_div and fast rcp intrinsics#3096

Merged
maleadt merged 2 commits intomasterfrom
vc/faster_div
Apr 15, 2026
Merged

Add tests for fast_div and fast rcp intrinsics#3096
maleadt merged 2 commits intomasterfrom
vc/faster_div

Conversation

@vchuravy
Copy link
Copy Markdown
Member

Follow-up to #3077

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: ab56cd0 Previous: 5065018 Ratio
array/accumulate/Float32/1d 101208 ns 101779 ns 0.99
array/accumulate/Float32/dims=1 76528 ns 76943 ns 0.99
array/accumulate/Float32/dims=1L 1585194 ns 1585563 ns 1.00
array/accumulate/Float32/dims=2 143253 ns 144109 ns 0.99
array/accumulate/Float32/dims=2L 657669 ns 658119 ns 1.00
array/accumulate/Int64/1d 118532 ns 118854 ns 1.00
array/accumulate/Int64/dims=1 80227 ns 80389 ns 1.00
array/accumulate/Int64/dims=1L 1695245 ns 1694708 ns 1.00
array/accumulate/Int64/dims=2 156577 ns 156166 ns 1.00
array/accumulate/Int64/dims=2L 962022.5 ns 961594 ns 1.00
array/broadcast 20810 ns 20512 ns 1.01
array/construct 1313.85 ns 1320.1 ns 1.00
array/copy 18968 ns 18914 ns 1.00
array/copyto!/cpu_to_gpu 217082 ns 214662 ns 1.01
array/copyto!/gpu_to_cpu 283361 ns 285022 ns 0.99
array/copyto!/gpu_to_gpu 11529 ns 11357 ns 1.02
array/iteration/findall/bool 131938 ns 132106 ns 1.00
array/iteration/findall/int 148879 ns 148958 ns 1.00
array/iteration/findfirst/bool 81910 ns 82321 ns 1.00
array/iteration/findfirst/int 84045.5 ns 84247.5 ns 1.00
array/iteration/findmin/1d 84855 ns 85453 ns 0.99
array/iteration/findmin/2d 117375 ns 117211 ns 1.00
array/iteration/logical 200038 ns 200370 ns 1.00
array/iteration/scalar 68049 ns 69501 ns 0.98
array/permutedims/2d 52836 ns 52561 ns 1.01
array/permutedims/3d 52824.5 ns 53080 ns 1.00
array/permutedims/4d 51488 ns 51880.5 ns 0.99
array/random/rand/Float32 13093 ns 12980 ns 1.01
array/random/rand/Int64 31000 ns 29766 ns 1.04
array/random/rand!/Float32 8495.666666666666 ns 8371 ns 1.01
array/random/rand!/Int64 34060 ns 33996 ns 1.00
array/random/randn/Float32 38197 ns 38221 ns 1.00
array/random/randn!/Float32 31407 ns 31486 ns 1.00
array/reductions/mapreduce/Float32/1d 35181 ns 35499.5 ns 0.99
array/reductions/mapreduce/Float32/dims=1 43967.5 ns 39750 ns 1.11
array/reductions/mapreduce/Float32/dims=1L 51836 ns 51923 ns 1.00
array/reductions/mapreduce/Float32/dims=2 56969.5 ns 56902 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69383.5 ns 70119.5 ns 0.99
array/reductions/mapreduce/Int64/1d 43171 ns 43301 ns 1.00
array/reductions/mapreduce/Int64/dims=1 42905.5 ns 53296 ns 0.81
array/reductions/mapreduce/Int64/dims=1L 87788 ns 87758 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59737 ns 60300 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 85291 ns 85288 ns 1.00
array/reductions/reduce/Float32/1d 35370 ns 35511 ns 1.00
array/reductions/reduce/Float32/dims=1 40119 ns 49667 ns 0.81
array/reductions/reduce/Float32/dims=1L 51989 ns 52100 ns 1.00
array/reductions/reduce/Float32/dims=2 57339 ns 56986 ns 1.01
array/reductions/reduce/Float32/dims=2L 70274 ns 70384 ns 1.00
array/reductions/reduce/Int64/1d 43053 ns 43476 ns 0.99
array/reductions/reduce/Int64/dims=1 50490 ns 42765.5 ns 1.18
array/reductions/reduce/Int64/dims=1L 87932 ns 87783 ns 1.00
array/reductions/reduce/Int64/dims=2 60003 ns 59896 ns 1.00
array/reductions/reduce/Int64/dims=2L 85024.5 ns 85038 ns 1.00
array/reverse/1d 18101 ns 18408 ns 0.98
array/reverse/1dL 68759 ns 69079 ns 1.00
array/reverse/1dL_inplace 65860 ns 65864 ns 1.00
array/reverse/1d_inplace 10185.333333333334 ns 10281.333333333334 ns 0.99
array/reverse/2d 20573 ns 20739 ns 0.99
array/reverse/2dL 72719 ns 72716 ns 1.00
array/reverse/2dL_inplace 65820 ns 65932 ns 1.00
array/reverse/2d_inplace 10668 ns 10105.5 ns 1.06
array/sorting/1d 2735643 ns 2734769 ns 1.00
array/sorting/2d 1069201 ns 1076074 ns 0.99
array/sorting/by 3304362 ns 3327052 ns 0.99
cuda/synchronization/context/auto 1178.9 ns 1184.7 ns 1.00
cuda/synchronization/context/blocking 928.8235294117648 ns 940.969696969697 ns 0.99
cuda/synchronization/context/nonblocking 7010.2 ns 8469.4 ns 0.83
cuda/synchronization/stream/auto 986.5555555555555 ns 1021.1 ns 0.97
cuda/synchronization/stream/blocking 780.0833333333334 ns 832.7631578947369 ns 0.94
cuda/synchronization/stream/nonblocking 7285.5 ns 7582.5 ns 0.96
integration/byval/reference 143932 ns 144056 ns 1.00
integration/byval/slices=1 145654 ns 145820 ns 1.00
integration/byval/slices=2 284592 ns 284683 ns 1.00
integration/byval/slices=3 423021 ns 423445 ns 1.00
integration/cudadevrt 102549 ns 102525.5 ns 1.00
integration/volumerhs 23440557.5 ns 23421325.5 ns 1.00
kernel/indexing 13083 ns 13267 ns 0.99
kernel/indexing_checked 13981 ns 14039 ns 1.00
kernel/launch 2119.2 ns 2141.4444444444443 ns 0.99
kernel/occupancy 665.6100628930818 ns 695.4267515923567 ns 0.96
kernel/rand 14345 ns 15508 ns 0.93
latency/import 3801168926.5 ns 3810464780.5 ns 1.00
latency/precompile 4587956035 ns 4591174243 ns 1.00
latency/ttfp 4382456999 ns 4390561943.5 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Copy Markdown
Member

maleadt commented Apr 14, 2026

CI failure related.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.42%. Comparing base (01a0795) to head (ab56cd0).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3096      +/-   ##
==========================================
- Coverage   90.43%   90.42%   -0.01%     
==========================================
  Files         141      141              
  Lines       12025    12025              
==========================================
- Hits        10875    10874       -1     
- Misses       1150     1151       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt maleadt merged commit 7a46bf3 into master Apr 15, 2026
2 checks passed
@maleadt maleadt deleted the vc/faster_div branch April 15, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants