Skip to content

Simplifying decodePixelMeasurement #405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Simplifying decodePixelMeasurement #405

wants to merge 4 commits into from

Conversation

alberth
Copy link
Contributor

@alberth alberth commented Sep 24, 2015

Not sure whether you want this, but I simplified the decodePixelMeasurement a lot.

@floe floe added this to the 0.2 milestone Sep 24, 2015
@floe
Copy link
Contributor

floe commented Sep 24, 2015

Thanks, this is certainly helpful. The code was so convoluted because it was originally ported from the shader implementation. However, I'll hold this for 0.2 at the moment.

@alberth
Copy link
Contributor Author

alberth commented Sep 25, 2015

rebased to new master

@xlz
Copy link
Member

xlz commented Sep 25, 2015

I think correctness is paramount here because this is the source of other reimplementation.

These are many commits to review. I hope the refactored code is provably equivalent.

@@ -332,63 +326,26 @@ class CpuDepthPacketProcessorImpl: public WithPerfLogging

int32_t decodePixelMeasurement(unsigned char* data, int sub, int x, int y)
{
if (x < 1 || y < 0 || 510 < x || 423 < y)
{
return lut11to16[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lut11to16[0] = 0

@xlz
Copy link
Member

xlz commented Sep 26, 2015

I did my own derivation and the refactoring seems correct.

There are just too many commits, which kind of pollute the commit history.

@floe
Copy link
Contributor

floe commented Sep 26, 2015

Can you still squash them after creating a PR?

@xlz
Copy link
Member

xlz commented Sep 26, 2015

Yes

@alberth
Copy link
Contributor Author

alberth commented Sep 28, 2015

(It seems the line note with the Python computation went missing due to my push, I'll paste it below)


I see no reason to discard x == 0 in this function.

Python computed what happens for x = 511:

>>> x = 511
>>> r1zi = (x >> 2) + ((x & 0x3) << 7)
>>> r1zi = r1zi * 11
>>> r1yi = r1zi >> 4
>>> r1yi
351
>>> r1zi = r1zi & 15
>>> r1zi
5

The ptr[r1yi + 1] gets left-shifted 11 positions, and is then discarded due to & 2047:

>>> i1 =  0       # Real data 0
>>> i2 = 0xffff  # Garbage
>>> i1 = i1 >> r1zi
>>> i2 = i2 << (16 - r1zi)
>>> (i1 | i2) & 2047
0   # No garbage bits

It seems x == 511 can be allowed here.

The filter stages that follow copy the edge pixels. Allowing the x == 0 || x == 511 here means the next-to-edge pixels get a slightly different value in filterPixelstage1


Given the result of the Python computation for x == 511, I added a commit for allowing the x == 0 || x == 511 pixels. It only pulls in i2 if needed, which saves 192 (37%) accesses on a single scan line.

It should work for this function, not sure what happens at other places in the code.

I'll have a look at cleaning up the commits, but not immediately.

@xlz
Copy link
Member

xlz commented Sep 28, 2015

ptr[r1yi + 1] reads beyond array end. This is a invalid memory access, not arithmetic.

Unless there is evidence that the edge pixels are inherently invalid, I believe it's just making the code of less branches. Filter1 does need a followup look on its edge cases.

@xlz
Copy link
Member

xlz commented Sep 28, 2015

I think the last commit is good.

The r1zi > 5 condition changes the memory access pattern. Does this make a difference in performance, better or worse?

Also now that it does not discard edge pixels, do those pixels contain sensible values (with filters disabled)?

@alberth
Copy link
Contributor Author

alberth commented Sep 29, 2015

Testing is a next step for me, but unfortunately, that is currently complicated, as my usb is not stable. It will need a somewhat simpler testbed than the freenect2 application.

I don't expect the memory access to make much difference, you'd need the next value soon for a next pixel in general. Fixing the jumping around in memory access due to the for(int x = 0; x < 512; x++) together with index = (x >> 2) | ((x & 3) << 7) over 9 different sub-images is probably much more interesting.

Due to known range of the x coordinate, "rizi >> 4" cannot go beyond 352.
The only way to get there is due to having an out-of-bound pixel (x, y) coordinate.
Therefore, "return lut11to16[0]" happens only for a true boolean condition.
@alberth
Copy link
Contributor Author

alberth commented Sep 30, 2015

Compressed the refactoring into 3 commits now. Tricky commit (commit 767ba4a) is tricky, and that commit has an explanation in the message.

Ran tests without filter-stages, between current depth calculation and with #404 and this PR (without updating all pixels), and output as well as execution time is the same.
With the added update of all pixels, and 37% skipping getting the 2nd uint16, output changes (left/right edge pixels are different now), but no significant time gain. Taking out the useless if ( x < 0 || y < 0 || 511 < x || 423 < y) branch (method is never called with out-of-bound positions) seems to have a bigger impact, but still not significant gains.

If you add filtering, total execution time grows, so the execution time changes get even smaller.

@xlz
Copy link
Member

xlz commented Sep 30, 2015

I mean if the edge pixels are physically valid values.

@xlz
Copy link
Member

xlz commented Jan 24, 2016

I'll give this some testing next Monday and merge it, #404 too.

@xlz
Copy link
Member

xlz commented Jan 25, 2016

The pixels at x == 0 || x == 511 have physically invalid values.
2016-01-25-131931_1920x1080_scrot pngquant

Yes, they should be removed.

@xlz
Copy link
Member

xlz commented Jan 25, 2016

I removed the last commit "Copy pixels at x==0 || x== 511 as well." and cherry picked the two PRs into the master.

@xlz xlz closed this Jan 25, 2016
@floe
Copy link
Contributor

floe commented Jan 27, 2016

Just curious: did #404 + #405 have any effect on performance?

@xlz
Copy link
Member

xlz commented Jan 27, 2016

There seems to be no change in performance (both are very slow - 6Hz as I tested).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants