-
Notifications
You must be signed in to change notification settings - Fork 252
Always use generated aligned_sizeof
#2838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/src/CUDA.jl b/src/CUDA.jl
index 8a82201a0..97402ecd5 100644
--- a/src/CUDA.jl
+++ b/src/CUDA.jl
@@ -54,7 +54,7 @@ using Printf
# - Base.aligned_sizeof is the size of an object in an array/inline alloced
# Both of them are equivalent for immutable objects, but differ for mutable singtons and Symbol
# We use `aligned_sizeof` since we care about the size of a type in an array
-@generated function aligned_sizeof(::Type{T}) where T
+@generated function aligned_sizeof(::Type{T}) where {T}
return :($(Base.aligned_sizeof(T)))
end
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2838 +/- ##
==========================================
- Coverage 89.78% 89.77% -0.02%
==========================================
Files 150 150
Lines 13229 13229
==========================================
- Hits 11878 11876 -2
- Misses 1351 1353 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 659d67f | Previous: c05359d | Ratio |
---|---|---|---|
latency/precompile |
42971664043.5 ns |
42922336650.5 ns |
1.00 |
latency/ttfp |
6993096215 ns |
7015168424 ns |
1.00 |
latency/import |
3570402743 ns |
3571269514 ns |
1.00 |
integration/volumerhs |
9607559 ns |
9608723 ns |
1.00 |
integration/byval/slices=1 |
146787 ns |
146920.5 ns |
1.00 |
integration/byval/slices=3 |
425733 ns |
425845 ns |
1.00 |
integration/byval/reference |
144985 ns |
145020 ns |
1.00 |
integration/byval/slices=2 |
286327 ns |
286380 ns |
1.00 |
integration/cudadevrt |
103478 ns |
103554 ns |
1.00 |
kernel/indexing |
14271 ns |
14235 ns |
1.00 |
kernel/indexing_checked |
14751 ns |
14711 ns |
1.00 |
kernel/occupancy |
673.8924050632911 ns |
672.5506329113924 ns |
1.00 |
kernel/launch |
2185.4444444444443 ns |
2270.3333333333335 ns |
0.96 |
kernel/rand |
14825 ns |
14669 ns |
1.01 |
array/reverse/1d |
19626 ns |
19682 ns |
1.00 |
array/reverse/2d |
23222.5 ns |
23613.5 ns |
0.98 |
array/reverse/1d_inplace |
10009 ns |
10461 ns |
0.96 |
array/reverse/2d_inplace |
11700 ns |
13212 ns |
0.89 |
array/copy |
20785 ns |
20972 ns |
0.99 |
array/iteration/findall/int |
157667 ns |
157808 ns |
1.00 |
array/iteration/findall/bool |
139824 ns |
139837 ns |
1.00 |
array/iteration/findfirst/int |
157411 ns |
164937 ns |
0.95 |
array/iteration/findfirst/bool |
157968 ns |
165868 ns |
0.95 |
array/iteration/scalar |
72335 ns |
73041 ns |
0.99 |
array/iteration/logical |
213993.5 ns |
214850 ns |
1.00 |
array/iteration/findmin/1d |
46243 ns |
46704 ns |
0.99 |
array/iteration/findmin/2d |
96178.5 ns |
96962.5 ns |
0.99 |
array/reductions/reduce/Int64/1d |
43151.5 ns |
46033 ns |
0.94 |
array/reductions/reduce/Int64/dims=1 |
47530.5 ns |
55193 ns |
0.86 |
array/reductions/reduce/Int64/dims=2 |
61834.5 ns |
62917 ns |
0.98 |
array/reductions/reduce/Int64/dims=1L |
88681 ns |
88869 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
87096 ns |
87079 ns |
1.00 |
array/reductions/reduce/Float32/1d |
34057 ns |
34606 ns |
0.98 |
array/reductions/reduce/Float32/dims=1 |
41224 ns |
43875 ns |
0.94 |
array/reductions/reduce/Float32/dims=2 |
59333 ns |
59705 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
52069 ns |
52260 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
69431 ns |
70051.5 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
43068 ns |
42671.5 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1 |
50494 ns |
45980 ns |
1.10 |
array/reductions/mapreduce/Int64/dims=2 |
61817.5 ns |
62143.5 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1L |
88523 ns |
88812 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
86431.5 ns |
86818 ns |
1.00 |
array/reductions/mapreduce/Float32/1d |
33770 ns |
34742 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=1 |
41333.5 ns |
43090.5 ns |
0.96 |
array/reductions/mapreduce/Float32/dims=2 |
59688 ns |
60061 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
52355 ns |
52528 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
69878 ns |
70191 ns |
1.00 |
array/broadcast |
20001.5 ns |
20155 ns |
0.99 |
array/copyto!/gpu_to_gpu |
11257 ns |
11294 ns |
1.00 |
array/copyto!/cpu_to_gpu |
215839 ns |
216503 ns |
1.00 |
array/copyto!/gpu_to_cpu |
283928 ns |
284237 ns |
1.00 |
array/accumulate/Int64/1d |
124425 ns |
125529 ns |
0.99 |
array/accumulate/Int64/dims=1 |
83088 ns |
84037 ns |
0.99 |
array/accumulate/Int64/dims=2 |
157834 ns |
159166 ns |
0.99 |
array/accumulate/Int64/dims=1L |
1708959.5 ns |
1720376 ns |
0.99 |
array/accumulate/Int64/dims=2L |
966206.5 ns |
968348 ns |
1.00 |
array/accumulate/Float32/1d |
108670 ns |
109984 ns |
0.99 |
array/accumulate/Float32/dims=1 |
80472 ns |
81082 ns |
0.99 |
array/accumulate/Float32/dims=2 |
147446 ns |
148760 ns |
0.99 |
array/accumulate/Float32/dims=1L |
1618565 ns |
1629307.5 ns |
0.99 |
array/accumulate/Float32/dims=2L |
697859 ns |
701479 ns |
0.99 |
array/construct |
1292.8 ns |
1287.2 ns |
1.00 |
array/random/randn/Float32 |
47586 ns |
44176 ns |
1.08 |
array/random/randn!/Float32 |
24637 ns |
24930 ns |
0.99 |
array/random/rand!/Int64 |
27389 ns |
27547 ns |
0.99 |
array/random/rand!/Float32 |
8754.333333333334 ns |
8724.666666666666 ns |
1.00 |
array/random/rand/Int64 |
29575 ns |
30114 ns |
0.98 |
array/random/rand/Float32 |
12759 ns |
13059 ns |
0.98 |
array/permutedims/4d |
59734 ns |
60761 ns |
0.98 |
array/permutedims/2d |
53974.5 ns |
54037 ns |
1.00 |
array/permutedims/3d |
54693.5 ns |
54954 ns |
1.00 |
array/sorting/1d |
2775235.5 ns |
2756544 ns |
1.01 |
array/sorting/by |
3368263.5 ns |
3343249 ns |
1.01 |
array/sorting/2d |
1087793 ns |
1080799 ns |
1.01 |
cuda/synchronization/stream/auto |
1035.9 ns |
1040.3 ns |
1.00 |
cuda/synchronization/stream/nonblocking |
7060.9 ns |
7220 ns |
0.98 |
cuda/synchronization/stream/blocking |
831.2708333333334 ns |
802.3333333333334 ns |
1.04 |
cuda/synchronization/context/auto |
1161.2 ns |
1203.5 ns |
0.96 |
cuda/synchronization/context/nonblocking |
7108.299999999999 ns |
7276.700000000001 ns |
0.98 |
cuda/synchronization/context/blocking |
881.2641509433962 ns |
900.4347826086956 ns |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
cc @vchuravy |
@christiangnrd perhaps we should open an issue with Julia, but it also just might mean that this relies on effect analysis to be elided and therefore bleh. |
This is also likely the same as JuliaGPU/GPUCompiler.jl#712 |
See #2790 (comment)
Close #2790