Simplifying decodePixelMeasurement #405

alberth · 2015-09-24T11:55:12Z

Not sure whether you want this, but I simplified the decodePixelMeasurement a lot.

floe · 2015-09-24T12:59:42Z

Thanks, this is certainly helpful. The code was so convoluted because it was originally ported from the shader implementation. However, I'll hold this for 0.2 at the moment.

alberth · 2015-09-25T06:40:45Z

rebased to new master

xlz · 2015-09-25T15:11:09Z

I think correctness is paramount here because this is the source of other reimplementation.

These are many commits to review. I hope the refactored code is provably equivalent.

xlz · 2015-09-26T03:57:04Z

src/cpu_depth_packet_processor.cpp

@@ -332,63 +326,26 @@ class CpuDepthPacketProcessorImpl: public WithPerfLogging

  int32_t decodePixelMeasurement(unsigned char* data, int sub, int x, int y)
  {
+    if (x < 1 || y < 0 || 510 < x || 423 < y)
+    {
+      return lut11to16[0];


lut11to16[0] = 0

xlz · 2015-09-26T04:05:07Z

I did my own derivation and the refactoring seems correct.

There are just too many commits, which kind of pollute the commit history.

floe · 2015-09-26T09:12:41Z

Can you still squash them after creating a PR?

xlz · 2015-09-26T12:23:36Z

Yes

alberth · 2015-09-28T08:11:32Z

(It seems the line note with the Python computation went missing due to my push, I'll paste it below)

I see no reason to discard x == 0 in this function.

Python computed what happens for x = 511:

>>> x = 511
>>> r1zi = (x >> 2) + ((x & 0x3) << 7)
>>> r1zi = r1zi * 11
>>> r1yi = r1zi >> 4
>>> r1yi
351
>>> r1zi = r1zi & 15
>>> r1zi
5

The ptr[r1yi + 1] gets left-shifted 11 positions, and is then discarded due to & 2047:

>>> i1 =  0       # Real data 0
>>> i2 = 0xffff  # Garbage
>>> i1 = i1 >> r1zi
>>> i2 = i2 << (16 - r1zi)
>>> (i1 | i2) & 2047
0   # No garbage bits

It seems x == 511 can be allowed here.

The filter stages that follow copy the edge pixels. Allowing the x == 0 || x == 511 here means the next-to-edge pixels get a slightly different value in filterPixelstage1

Given the result of the Python computation for x == 511, I added a commit for allowing the x == 0 || x == 511 pixels. It only pulls in i2 if needed, which saves 192 (37%) accesses on a single scan line.

It should work for this function, not sure what happens at other places in the code.

I'll have a look at cleaning up the commits, but not immediately.

xlz · 2015-09-28T21:24:52Z

ptr[r1yi + 1] reads beyond array end. This is a invalid memory access, not arithmetic.

Unless there is evidence that the edge pixels are inherently invalid, I believe it's just making the code of less branches. Filter1 does need a followup look on its edge cases.

xlz · 2015-09-28T21:40:37Z

I think the last commit is good.

The r1zi > 5 condition changes the memory access pattern. Does this make a difference in performance, better or worse?

Also now that it does not discard edge pixels, do those pixels contain sensible values (with filters disabled)?

alberth · 2015-09-29T06:24:17Z

Testing is a next step for me, but unfortunately, that is currently complicated, as my usb is not stable. It will need a somewhat simpler testbed than the freenect2 application.

I don't expect the memory access to make much difference, you'd need the next value soon for a next pixel in general. Fixing the jumping around in memory access due to the for(int x = 0; x < 512; x++) together with index = (x >> 2) | ((x & 3) << 7) over 9 different sub-images is probably much more interesting.

Due to known range of the x coordinate, "rizi >> 4" cannot go beyond 352. The only way to get there is due to having an out-of-bound pixel (x, y) coordinate. Therefore, "return lut11to16[0]" happens only for a true boolean condition.

alberth · 2015-09-30T08:12:50Z

Compressed the refactoring into 3 commits now. Tricky commit (commit 767ba4a) is tricky, and that commit has an explanation in the message.

Ran tests without filter-stages, between current depth calculation and with #404 and this PR (without updating all pixels), and output as well as execution time is the same.
With the added update of all pixels, and 37% skipping getting the 2nd uint16, output changes (left/right edge pixels are different now), but no significant time gain. Taking out the useless if ( x < 0 || y < 0 || 511 < x || 423 < y) branch (method is never called with out-of-bound positions) seems to have a bigger impact, but still not significant gains.

If you add filtering, total execution time grows, so the execution time changes get even smaller.

xlz · 2015-09-30T14:31:32Z

I mean if the edge pixels are physically valid values.

xlz · 2016-01-24T01:33:11Z

I'll give this some testing next Monday and merge it, #404 too.

xlz · 2016-01-25T18:21:33Z

The pixels at x == 0 || x == 511 have physically invalid values.

Yes, they should be removed.

xlz · 2016-01-25T18:32:51Z

I removed the last commit "Copy pixels at x==0 || x== 511 as well." and cherry picked the two PRs into the master.

floe · 2016-01-27T11:25:49Z

Just curious: did #404 + #405 have any effect on performance?

xlz · 2016-01-27T16:42:02Z

There seems to be no change in performance (both are very slow - 6Hz as I tested).

floe added this to the 0.2 milestone Sep 24, 2015

xlz reviewed Sep 26, 2015
View reviewed changes

floe mentioned this pull request Sep 28, 2015

OpenCL gives white noise in depth buffer #183

Closed

alberth added 4 commits September 29, 2015 11:57

Move 'data' acces function to the point where it is needed.

4437121

Merge booleans, eliminate bfi and r4wi.

4afd1a9

Split case of r1yi bigger than 352.

767ba4a

Due to known range of the x coordinate, "rizi >> 4" cannot go beyond 352. The only way to get there is due to having an out-of-bound pixel (x, y) coordinate. Therefore, "return lut11to16[0]" happens only for a true boolean condition.

Copy pixels at x==0 || x== 511 as well.

7c34bc9

xlz closed this Jan 25, 2016

Simplifying decodePixelMeasurement #405

Simplifying decodePixelMeasurement #405

Uh oh!

Conversation

alberth commented Sep 24, 2015

Uh oh!

floe commented Sep 24, 2015

Uh oh!

alberth commented Sep 25, 2015

Uh oh!

xlz commented Sep 25, 2015

Uh oh!

xlz Sep 26, 2015

Choose a reason for hiding this comment

Uh oh!

xlz commented Sep 26, 2015

Uh oh!

floe commented Sep 26, 2015

Uh oh!

xlz commented Sep 26, 2015

Uh oh!

alberth commented Sep 28, 2015

Uh oh!

xlz commented Sep 28, 2015

Uh oh!

xlz commented Sep 28, 2015

Uh oh!

alberth commented Sep 29, 2015

Uh oh!

alberth commented Sep 30, 2015

Uh oh!

xlz commented Sep 30, 2015

Uh oh!

xlz commented Jan 24, 2016

Uh oh!

xlz commented Jan 25, 2016

Uh oh!

xlz commented Jan 25, 2016

Uh oh!

floe commented Jan 27, 2016

Uh oh!

xlz commented Jan 27, 2016

Uh oh!

Uh oh!