Skip to content

Conversation

jlgreathouse
Copy link
Contributor

Fixes Issue #109

This patch is an overarching solution to both Issue #109 as well as a feature enhancement that allows users to optionally compute SpM-DV using compensated summation.

See closed PR #123 for more details.

…ts us from adding an extra copy of the final workgroup if it turns out that our total number of rows exactly fits within the final rowBlock. This caused problems with global asynchronous reduction.
…lity. Using double precision intermediate calculations for the float answer. Using compensated summation to (in effect) perform quad precision intermediate calculations for the double answer. Calculating ULPs difference between CPU and GPU results.
…oating-point underflow, this calculates answers in twice the precision before rounding to the native precision. In other words, like calculating all of the intermediate results in double before finally rounding to float. This results in many matrices being bitwise identical to what is calculated on the CPU.
…xpanding the rowBlocks buffer size by 2x in order to have a place to reduce error values between workgroups in the CSR-LongRows case.
…rameters out of the kernel. Made the 2sum algorithm in csrmv_general slightly faster.
…p in CSR-LongRows to work on more than a single block of NNZs. This is more efficient and results in higher performance. Also split up CSR-Vector from the LongRows algorithm. Added some more tuning knobs to CSR-Adaptive with respect to these changes.
…l reduction mechanism when there are relatively few rows within the row block.
… the number of threads assigned to the parallel CSR-Stream reduction on the CPU instead of making each GPU workgroup do it. Change the action away from being a division and replace with some faster bit math.
…eries of short rows and then a new long-ish row. CSR-Stream runs into performance issues when trying to reduce multiple rows of extremely varying lengths, so we just put these rows into different workgroups.
…ws at a time to CSR-Vector to only 1. It turns out that, after recent modifications to CSR-Stream, it is more efficient to use CSR-Stream for this case.
…pes from size_t to unsigned int, since we currently do not work on extremely large data structures. Changing around some other data types. In general, all of this results in some minor performance gains on Fiji GPUs.
…rformance increases on DPFP-starved GPUs when working in double precision mode.
… of multiplications that snuck their way into the code. 32-bit integer multiply is slow (up to 16x slower than addition on AMD GPUs), so replaced with full-speed addition or full-speed 24-bit multiply when required. Results in major performance gains on Fiji GPUs.
…c work to better decide whether our target hardware supports appropriate atomics. Currently does not work with targets that support fp64 but not 64-bit integer atomics.
… not support 64-bit atomics. Fall back to using only CSR-Vector in this case. Also made some changes to CSR-LongRows beta initialization to fix a memory consistency issue.
Conflicts:
	src/library/kernels/csrmv_adaptive.cl
…e opaque command structure. Adding it as a command-line option for the test-blas2 program.
… summation. Fixed naming convention for this in test-blas2.
… single-precision compensated summation can have errors due to lack of denorm support.
@kknox
Copy link
Contributor

kknox commented Aug 19, 2015

@jlgreathouse
Looks like clang has a problem with the printf syntax (not portable). Checking in a new commit to the branch [ jlgreathouse:newer_adaptive_squash ] will cause Travis to kick off a new build test. It will tell you when it can successfully build.

@jlgreathouse
Copy link
Contributor Author

Done. Changed printf over to appropriately configured couts. The PRIu64 mechanism should be the C99 portable way of printing data of the right size for uint types (I think I forgot to include inttypes.h, so clang failed). However, MSVC doesn't support C99 properly, so C++ mechanisms it is.

kknox pushed a commit that referenced this pull request Aug 19, 2015
Adding Compensated Summation to SpM-DV
:+1:
@kknox kknox merged commit f9038d8 into clMathLibraries:develop Aug 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants