Skip to content

Conversation

@danieldresser-ie
Copy link
Contributor

I feel like maybe I should have explained a bit more about the now-somewhat-odd interface to Reindexer, but maybe it doesn't matter since it's fully internal.

Maybe take a look and see if this makes sense to you?

@danieldresser-ie
Copy link
Contributor Author

danieldresser-ie commented Nov 5, 2024

Reverted the segment() tests to their previous hardcoded values, so those should be passing now.

Here are the performance numbers if I run the full, slow, perf tests:

BEFORE:

FLOAT SORT TEST 6.08
SHORTCUTTED CONSTRUCTION 0.18
SHORTCUT FAIL LATE 0.4
SHORTCUT FAIL EARLY 0.37
SPLIT INTO 250000 0.21
SPLIT INTO 25000 0.22
SPLIT INTO 2500 0.24
SPLIT INTO 250 0.18
SPLIT INTO 25 0.18
SPLIT INTO 2 0.18
DOUBLE RANGE 0.3
QUADRUPLE RANGE 0.38
OCTUPLE RANGE 3.2
REINDEX 10 0.08
REINDEX 100 0.06
REINDEX 1000 0.07
REINDEX 10000 0.12
REINDEX 100000 0.67
REINDEX 1000000 6.1
REINDEX random 10 0.18
REINDEX random 100 0.88
REINDEX random 1000 3.44
REINDEX random 10000 4.4
REINDEX random 100000 3.99
REINDEX random 1000000 8.11
REINDEX pathological 10 0.15
REINDEX pathological 100 0.86
REINDEX pathological 1000 4.94
REINDEX pathological 10000 3.49
REINDEX pathological 100000 3.33
REINDEX pathological 1000000 7.8

AFTER:

FLOAT SORT TEST 6.02
SHORTCUTTED CONSTRUCTION 0.16
SHORTCUT FAIL LATE 0.41
SHORTCUT FAIL EARLY 0.39
SPLIT INTO 250000 0.2
SPLIT INTO 25000 0.21
SPLIT INTO 2500 0.22
SPLIT INTO 250 0.17
SPLIT INTO 25 0.18
SPLIT INTO 2 0.16
DOUBLE RANGE 0.31
QUADRUPLE RANGE 0.4
OCTUPLE RANGE 3.18
REINDEX 10 0.11
REINDEX 100 0.09
REINDEX 1000 0.1
REINDEX 10000 0.19
REINDEX 100000 1.03
REINDEX 1000000 9.06
REINDEX random 10 0.22
REINDEX random 100 1.16
REINDEX random 1000 4.75
REINDEX random 10000 6.33
REINDEX random 100000 6.03
REINDEX random 1000000 12.52
REINDEX pathological 10 0.2
REINDEX pathological 100 1.14
REINDEX pathological 1000 7.25
REINDEX pathological 10000 5.42
REINDEX pathological 100000 5.43
REINDEX pathological 1000000 12.04

Basically, the smaller tests are pretty well equivalent, some of the big tests get 50% slower. Probably an acceptable tradeoff?

Copy link
Member

@johnhaddon johnhaddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel! The requirement to call computeIndices() manually in some cases but not others does feel a bit odd - if we can make computeIndices() private and fully automatic without too much cost then I think that would be worthwhile. I made a couple of comments inline about that...


// Don't add the index, but just test if it is a part of the reindex. If it is an
// id which has already been added, return the new id, otherwise return -1
// You must call computeIndices() after calling addIndex and before calling this function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the only public method that puts the onus on the caller to call computeIndices()? If this also called computeIndices() for you, then it seems computeIndices() could be private and the Reindexer API wouldn't have changed at all.

I'm assuming it's related to performance, but I can't see anywhere where addIndex() and testIndex() are interleaved in a way that would cause repeated computeIndices() calls. Maybe I missed something? If it is performance, then perhaps it could be alleviated by only calling computeIndices() when we find we're about to return an ID that hasn't been computed yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's not about performance, but just that addIndex() doesn't reset m_indicesComputed back to false?

@johnhaddon
Copy link
Member

Oh, and Changes needs updating, and the PR needs to target RB-10.5.

@danieldresser-ie danieldresser-ie changed the base branch from main to RB-10.5 November 7, 2024 23:13
@danieldresser-ie
Copy link
Contributor Author

My instincts didn't like calling computeIndices in an inner loop, when our usage of this private class never requires it, but there's no measurable loss of performance from it, so I've cleaned up the API.

Also rebased and added Changes entry.

@johnhaddon johnhaddon merged commit cdf6a7a into ImageEngine:RB-10.5 Nov 11, 2024
5 checks passed
@johnhaddon
Copy link
Member

Merged. Thanks Daniel!

ivanimanishi added a commit to ivanimanishi/cortex that referenced this pull request Nov 21, 2024
Reverted recent update since ImageEngine#1441 made it unnecessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants