-
Notifications
You must be signed in to change notification settings - Fork 641
Open
Description
#49 adds CUDA GUPS to support this blog post
Unfortunately, the actual CUDA GUPS numbers are not published anywhere, let me share mine:
H100:
Size,updates,reads,writes,reads_writes,updates_no_loop
536870912,12.286535,31.907222,13.404891,10.998889,11.234400
1073741824,12.337535,31.815486,13.366453,11.018896,11.285350
2147483648,12.315124,31.770212,13.347206,11.003539,11.253983
Size,shmem_GPU_updates,shmem_SM_updates,shmem_GPU_reads,shmem_SM_reads,shmem_GPU_writes,shmem_SM_writes,shmem_GPU_reads_writes,shmem_SM_reads_writes,shmem_GPU_updates_no_loop,shmem_SM_updates_no_loop
1024,9.649010,0.073099,49.600047,0.375758,52.800050,0.400000,51.503349,0.390177,50.268812,0.380824
2048,11.245725,0.085195,26.420813,0.200158,28.143789,0.213211,27.582764,0.208960,26.875797,0.203605
4096,9.472967,0.071765,13.470942,0.102053,14.337969,0.108621,14.042449,0.106382,13.681726,0.103649
8192,4.697732,0.035589,5.802755,0.043960,6.171923,0.046757,6.053206,0.045858,5.894693,0.044657
16384,1.655971,0.012545,1.936600,0.014671,2.060059,0.015607,2.019944,0.015303,1.967753,0.014907
H200:
Size,updates,reads,writes,reads_writes,updates_no_loop
536870912,12.812498,36.179952,15.135795,13.849506,14.018524
1073741824,13.562377,39.055498,15.731977,13.575859,13.652478
2147483648,13.564026,38.989000,15.704839,13.573478,13.647836
Size,shmem_GPU_updates,shmem_SM_updates,shmem_GPU_reads,shmem_SM_reads,shmem_GPU_writes,shmem_SM_writes,shmem_GPU_reads_writes,shmem_SM_reads_writes,shmem_GPU_updates_no_loop,shmem_SM_updates_no_loop
1024,9.775508,0.074057,49.784024,0.377152,52.883266,0.400631,51.661943,0.391378,50.648199,0.383698
2048,11.242899,0.085173,26.452055,0.200394,28.155596,0.213300,27.594105,0.209046,26.875797,0.203605
4096,9.498778,0.071960,13.476352,0.102094,14.334906,0.108598,14.042449,0.106382,13.686609,0.103686
8192,4.685105,0.035493,5.799120,0.043933,6.168519,0.046731,6.047478,0.045814,5.891200,0.044630
16384,1.654609,0.012535,1.934911,0.014658,2.058854,0.015597,2.018418,0.015291,1.966006,0.014894
From this data it's clear that GUPS in shared memory is so slow that it does not make sense at all. So what's the purpose of it? Am I testing something in a wrong way, or what was the intent of having the shared memory support in GUPS?
Metadata
Metadata
Assignees
Labels
No labels