[ClickHouse–Iceberg] Fix performance issue in decimal bounds processing for integer decimals #388
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context: ClickHouse–Iceberg integration (manifest decimal bounds)
Fix performance issue in decimal bounds processing for integer decimals
When processing decimal bounds in Iceberg manifest files, the code
had a severe performance issue for integer decimals (Decimal without
precision - whole numbers with scale = 0). The condition
while (--scale)
when scale = 0 becomes
while (-1)
which is true, causing the loop toiterate through all possible int32_t values (~4.3 billion iterations)
before stopping when scale wraps back to 0.
This resulted in approximately 3-second delay per integer decimal
value during bounds processing, significantly impacting performance.
Additionally, this could corrupt statistics because
unscaled_value += scaler
would add an incorrect large value due to the massive scaler accumulated
over billions of iterations, leading to wrong bounds calculations.
Fix by adding a check
if (scale)
before the loop, ensuring thescaling logic only runs for non-zero scale values (fractional numbers).
For integer decimals (scale = 0), no scaling is needed, so the scaler
remains 0, which is the correct behavior.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
...
Documentation entry for user-facing changes
CI Settings
NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step
Run these jobs only (required builds will be added automatically):
Deny these jobs:
Extra options:
Only specified batches in multi-batch jobs: