Skip to content

Conversation

christiangnrd
Copy link
Member

Copy link
Contributor

github-actions bot commented Aug 8, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/src/CUDA.jl b/src/CUDA.jl
index 8a82201a0..97402ecd5 100644
--- a/src/CUDA.jl
+++ b/src/CUDA.jl
@@ -54,7 +54,7 @@ using Printf
 # - Base.aligned_sizeof is the size of an object in an array/inline alloced
 # Both of them are equivalent for immutable objects, but differ for mutable singtons and Symbol
 # We use `aligned_sizeof` since we care about the size of a type in an array
-@generated function aligned_sizeof(::Type{T}) where T
+@generated function aligned_sizeof(::Type{T}) where {T}
     return :($(Base.aligned_sizeof(T)))
 end
 

Copy link

codecov bot commented Aug 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.77%. Comparing base (205c238) to head (659d67f).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2838      +/-   ##
==========================================
- Coverage   89.78%   89.77%   -0.02%     
==========================================
  Files         150      150              
  Lines       13229    13229              
==========================================
- Hits        11878    11876       -2     
- Misses       1351     1353       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 659d67f Previous: c05359d Ratio
latency/precompile 42971664043.5 ns 42922336650.5 ns 1.00
latency/ttfp 6993096215 ns 7015168424 ns 1.00
latency/import 3570402743 ns 3571269514 ns 1.00
integration/volumerhs 9607559 ns 9608723 ns 1.00
integration/byval/slices=1 146787 ns 146920.5 ns 1.00
integration/byval/slices=3 425733 ns 425845 ns 1.00
integration/byval/reference 144985 ns 145020 ns 1.00
integration/byval/slices=2 286327 ns 286380 ns 1.00
integration/cudadevrt 103478 ns 103554 ns 1.00
kernel/indexing 14271 ns 14235 ns 1.00
kernel/indexing_checked 14751 ns 14711 ns 1.00
kernel/occupancy 673.8924050632911 ns 672.5506329113924 ns 1.00
kernel/launch 2185.4444444444443 ns 2270.3333333333335 ns 0.96
kernel/rand 14825 ns 14669 ns 1.01
array/reverse/1d 19626 ns 19682 ns 1.00
array/reverse/2d 23222.5 ns 23613.5 ns 0.98
array/reverse/1d_inplace 10009 ns 10461 ns 0.96
array/reverse/2d_inplace 11700 ns 13212 ns 0.89
array/copy 20785 ns 20972 ns 0.99
array/iteration/findall/int 157667 ns 157808 ns 1.00
array/iteration/findall/bool 139824 ns 139837 ns 1.00
array/iteration/findfirst/int 157411 ns 164937 ns 0.95
array/iteration/findfirst/bool 157968 ns 165868 ns 0.95
array/iteration/scalar 72335 ns 73041 ns 0.99
array/iteration/logical 213993.5 ns 214850 ns 1.00
array/iteration/findmin/1d 46243 ns 46704 ns 0.99
array/iteration/findmin/2d 96178.5 ns 96962.5 ns 0.99
array/reductions/reduce/Int64/1d 43151.5 ns 46033 ns 0.94
array/reductions/reduce/Int64/dims=1 47530.5 ns 55193 ns 0.86
array/reductions/reduce/Int64/dims=2 61834.5 ns 62917 ns 0.98
array/reductions/reduce/Int64/dims=1L 88681 ns 88869 ns 1.00
array/reductions/reduce/Int64/dims=2L 87096 ns 87079 ns 1.00
array/reductions/reduce/Float32/1d 34057 ns 34606 ns 0.98
array/reductions/reduce/Float32/dims=1 41224 ns 43875 ns 0.94
array/reductions/reduce/Float32/dims=2 59333 ns 59705 ns 0.99
array/reductions/reduce/Float32/dims=1L 52069 ns 52260 ns 1.00
array/reductions/reduce/Float32/dims=2L 69431 ns 70051.5 ns 0.99
array/reductions/mapreduce/Int64/1d 43068 ns 42671.5 ns 1.01
array/reductions/mapreduce/Int64/dims=1 50494 ns 45980 ns 1.10
array/reductions/mapreduce/Int64/dims=2 61817.5 ns 62143.5 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 88523 ns 88812 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 86431.5 ns 86818 ns 1.00
array/reductions/mapreduce/Float32/1d 33770 ns 34742 ns 0.97
array/reductions/mapreduce/Float32/dims=1 41333.5 ns 43090.5 ns 0.96
array/reductions/mapreduce/Float32/dims=2 59688 ns 60061 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 52355 ns 52528 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 69878 ns 70191 ns 1.00
array/broadcast 20001.5 ns 20155 ns 0.99
array/copyto!/gpu_to_gpu 11257 ns 11294 ns 1.00
array/copyto!/cpu_to_gpu 215839 ns 216503 ns 1.00
array/copyto!/gpu_to_cpu 283928 ns 284237 ns 1.00
array/accumulate/Int64/1d 124425 ns 125529 ns 0.99
array/accumulate/Int64/dims=1 83088 ns 84037 ns 0.99
array/accumulate/Int64/dims=2 157834 ns 159166 ns 0.99
array/accumulate/Int64/dims=1L 1708959.5 ns 1720376 ns 0.99
array/accumulate/Int64/dims=2L 966206.5 ns 968348 ns 1.00
array/accumulate/Float32/1d 108670 ns 109984 ns 0.99
array/accumulate/Float32/dims=1 80472 ns 81082 ns 0.99
array/accumulate/Float32/dims=2 147446 ns 148760 ns 0.99
array/accumulate/Float32/dims=1L 1618565 ns 1629307.5 ns 0.99
array/accumulate/Float32/dims=2L 697859 ns 701479 ns 0.99
array/construct 1292.8 ns 1287.2 ns 1.00
array/random/randn/Float32 47586 ns 44176 ns 1.08
array/random/randn!/Float32 24637 ns 24930 ns 0.99
array/random/rand!/Int64 27389 ns 27547 ns 0.99
array/random/rand!/Float32 8754.333333333334 ns 8724.666666666666 ns 1.00
array/random/rand/Int64 29575 ns 30114 ns 0.98
array/random/rand/Float32 12759 ns 13059 ns 0.98
array/permutedims/4d 59734 ns 60761 ns 0.98
array/permutedims/2d 53974.5 ns 54037 ns 1.00
array/permutedims/3d 54693.5 ns 54954 ns 1.00
array/sorting/1d 2775235.5 ns 2756544 ns 1.01
array/sorting/by 3368263.5 ns 3343249 ns 1.01
array/sorting/2d 1087793 ns 1080799 ns 1.01
cuda/synchronization/stream/auto 1035.9 ns 1040.3 ns 1.00
cuda/synchronization/stream/nonblocking 7060.9 ns 7220 ns 0.98
cuda/synchronization/stream/blocking 831.2708333333334 ns 802.3333333333334 ns 1.04
cuda/synchronization/context/auto 1161.2 ns 1203.5 ns 0.96
cuda/synchronization/context/nonblocking 7108.299999999999 ns 7276.700000000001 ns 0.98
cuda/synchronization/context/blocking 881.2641509433962 ns 900.4347826086956 ns 0.98

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Member

maleadt commented Aug 9, 2025

cc @vchuravy

@maleadt maleadt merged commit c8c2142 into JuliaGPU:master Aug 9, 2025
3 checks passed
@christiangnrd christiangnrd deleted the fixmem branch August 9, 2025 11:23
@vchuravy
Copy link
Member

@christiangnrd perhaps we should open an issue with Julia, but it also just might mean that this relies on effect analysis to be elided and therefore bleh.

@vchuravy
Copy link
Member

This is also likely the same as JuliaGPU/GPUCompiler.jl#712

maleadt pushed a commit that referenced this pull request Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Illegal memory access after aligned_sizeof changes
3 participants