An adaptive deferred shading implementation based on the paper Deferred Adaptive Compute Shading
⬜: shading pixel 🟩: shaded pixel ⬛: unshaded pixel
⬜⬛⬛⬛
⬛⬛⬛⬛
⬛⬛⬛⬛
⬛⬛⬛⬛
Dispatch Dimension: (Screen Width / 4) * (Screen Height / 4) * 1
Access Pattern:
- Compute Block Idx
- Apply Pixel Offset
- Shade Target Pixel
🟩⬛⬛⬛
⬛⬛⬛⬛
⬛⬛⬜⬛
⬛⬛⬛⬛
Dispatch Dimension: (Screen Width / 4) * (Screen Height / 4) * 1
Access Pattern:
- Compute Block Idx
- Apply Pixel Offset
- Derive Statistics Property
- Interpolate Or Shade The Target Pixel Based On Stats Info.
Interpolate between neighbors if (i + 2, j + 2), (i - 2, j + 2), (i + 2, j - 2), (i - 2, j - 2) are similar.
🟩⬛⬜⬛
⬛⬛⬛⬛
⬜⬛🟩⬛
⬛⬛⬛⬛
Dispatch Dimension: (Screen Width / 4) * (Screen Height / 4) * 2
Interpolate between neighbors if (i + 2, j), (i - 2, j), (i, j - 2), (i, j + 2) are similar.
🟩⬛🟩⬛
⬛⬜⬛⬜
🟩⬛🟩⬛
⬛⬜⬛⬜
Dispatch Dimension: (Screen Width / 4) * (Screen Height / 4) * 4
Interpolate between neighbors if (i + 1, j + 1), (i - 1, j - 1), (i + 1, j - 1), (i - 1, j + 1) are similar.
🟩⬜🟩⬜
⬜🟩⬜🟩
🟩⬜🟩⬜
⬜🟩⬜🟩
Dispatch Dimension: (Screen Width / 4) * (Screen Height / 4) * 8
Interpolate between neighbors if (i + 1, j), (i - 1, j), (i, j - 1), (i, j + 1) are similar.
Given 4 pixels, they are similar if the variance is lower than the threshold.
⬜: shading 🟩: interpolation
Assume Warp Size = 16
⬜⬜⬜⬜
⬜🟩⬜🟩
🟩⬜|⬜|⬜
⬜⬜⬜⬜
Total Shading Count WaveActiveCountBits(shading) = 13,
Given thread 10, srcLaneIdx = 10, WavePrefixCountBits(shading) = 7 (not including itself),
interpolation before thread 10 is srcLaneIdx - WavePrefixCountBits(shading) = 3
dstLaneIdx = srcLaneIdx is shading? WavePrefixCountBits(shading) = 7: (WaveActiveCountBits(shading) + (srcLaneIdx - WavePrefixCountBits(shading)) = 16) = 7
Finally, srcLaneIdx = 10, dstLaneIdx = 7
srcLaneIdx = 0, WavePrefixCountBits(shading) = 0, dstLaneIdx = 0
srcLaneIdx = 1, WavePrefixCountBits(shading) = 1, dstLaneIdx = 1
srcLaneIdx = 2, WavePrefixCountBits(shading) = 2, dstLaneIdx = 2
srcLaneIdx = 3, WavePrefixCountBits(shading) = 3, dstLaneIdx = 3
srcLaneIdx = 4, WavePrefixCountBits(shading) = 4, dstLaneIdx = 4
srcLaneIdx = 5, WavePrefixCountBits(shading) = 5, dstLaneIdx = 13
srcLaneIdx = 6, WavePrefixCountBits(shading) = 5, dstLaneIdx = 5
srcLaneIdx = 7, WavePrefixCountBits(shading) = 6, dstLaneIdx = 14
srcLaneIdx = 8, WavePrefixCountBits(shading) = 6, dstLaneIdx = 15
srcLaneIdx = 9, WavePrefixCountBits(shading) = 6, dstLaneIdx = 6
srcLaneIdx = 10, WavePrefixCountBits(shading) = 7, dstLaneIdx = 7
srcLaneIdx = 11, WavePrefixCountBits(shading) = 8, dstLaneIdx = 8
srcLaneIdx = 12, WavePrefixCountBits(shading) = 9, dstLaneIdx = 9
srcLaneIdx = 13, WavePrefixCountBits(shading) = 10, dstLaneIdx = 10
srcLaneIdx = 14, WavePrefixCountBits(shading) = 11, dstLaneIdx = 11
srcLaneIdx = 15, WavePrefixCountBits(shading) = 12, dstLaneIdx = 12
⬜: shading 🟩: interpolation
Assume Warp Size = 16
⬜⬜⬜⬜
⬜🟩⬜🟩
🟩⬜⬜⬜
⬜⬜⬜⬜
⬜🟩🟩🟩
⬜🟩⬜🟩
🟩⬜🟩⬜
🟩⬜🟩🟩