Skip to content

Conversation

@gbrgr
Copy link
Collaborator

@gbrgr gbrgr commented Nov 6, 2025

Closes RAI-44110

Adds support for replace operations in snapshot histories for incremental scans.

Even though replace operations logically keep data the same, we still report file additions and deletions, as their physical layout changes and files to which the rows belong change. This is necessary for incremental scan users who want to base change tracking off of file identifiers.

@gbrgr gbrgr changed the title Add support for replace Add support for replace in incremental scan Nov 6, 2025
@gbrgr gbrgr marked this pull request as ready for review November 6, 2025 12:35
@gbrgr gbrgr requested a review from vustef November 6, 2025 12:35
Copy link

@vustef vustef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple clarifications

/// 1. Files to compact: Vec<String> of existing file names that are being compacted
/// 2. Target file: String name of the new compacted file
///
/// Example: `Replace(vec!["file-a.parquet", "file-b.parquet"], "file-a-b-compacted.parquet")`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how iceberg engines do it too? How do they retarget positional delete files to the file-a-b-compacted.parquet?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what spark does is that essentially file-a-b-compacted.parquet will contain the records of file-a + file-b minus the positional deletes (and equality deletes). However, existing delete files of file-a and file-b remain in-place.

@gbrgr gbrgr changed the title Add support for replace in incremental scan feat(core): Add support for replace in incremental scan Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants