Skip to content

Conversation

hide42
Copy link

@hide42 hide42 commented Sep 24, 2025

Context: ClickHouse–Iceberg integration (manifest decimal bounds)
Fix performance issue in decimal bounds processing for integer decimals

When processing decimal bounds in Iceberg manifest files, the code
had a severe performance issue for integer decimals (Decimal without
precision - whole numbers with scale = 0). The condition while (--scale)
when scale = 0 becomes while (-1) which is true, causing the loop to
iterate through all possible int32_t values (~4.3 billion iterations)
before stopping when scale wraps back to 0.

This resulted in approximately 3-second delay per integer decimal
value during bounds processing, significantly impacting performance.

Additionally, this could corrupt statistics because unscaled_value += scaler
would add an incorrect large value due to the massive scaler accumulated
over billions of iterations, leading to wrong bounds calculations.

Fix by adding a check if (scale) before the loop, ensuring the
scaling logic only runs for non-zero scale values (fractional numbers).
For integer decimals (scale = 0), no scaling is needed, so the scaler
remains 0, which is the correct behavior.

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Run these jobs only (required builds will be added automatically):

  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Unit tests
  • Performance tests
  • All with aarch64
  • All with ASAN
  • All with TSAN
  • All with Analyzer
  • All with Azure
  • Add your option here

Deny these jobs:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64

Extra options:

  • do not test (only style check)
  • disable merge-commit (no merge from master before tests)
  • disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

  • 1
  • 2
  • 3
  • 4

When scale = 0 (integer decimals), `while (--scale)` iterates through
all int32_t values (~4.3B iterations) causing ~3s delay per value and
corrupting statistics via incorrect `unscaled_value += scaler`.
@CLAassistant
Copy link

CLAassistant commented Sep 24, 2025

CLA assistant check
All committers have signed the CLA.

@hide42 hide42 changed the title Fix performance issue in decimal bounds processing for integer decimals [ClickHouse–Iceberg] Fix performance issue in decimal bounds processing for integer decimals Sep 24, 2025
@kafka1991
Copy link
Contributor

kafka1991 commented Sep 25, 2025

Upon confirmation this issue was also fixed in CH's 25.8.2(Included in PR #383)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants