feat(core): Implement `RollingFileWriter` to help split data into multiple files #1547

CTTY · 2025-07-23T00:51:15Z

Which issue does this PR close?

Closes Implement RollingFileWriter: Helps split incoming data into multiple files #1541

What changes are included in this PR?

Added RollingFileWriter
Fix some minor typos in writer mod

Are these changes tested?

added unit tests

liurenjie1024

Thanks @CTTY for this pr, just finised first round of review.

crates/iceberg/src/writer/file_writer/rolling_writer.rs

liurenjie1024 · 2025-07-24T10:25:06Z

crates/iceberg/src/writer/file_writer/rolling_writer.rs

+        if self.should_roll(input_size) {
+            if let Some(inner) = self.inner.take() {
+                // close the current writer, roll to a new file
+                let handle = spawn(async move { inner.close().await });


This is an interesting optimization, but I would suggest not do it for now. A writer usually consumes some resources like memory, connections etc. Closing them in an async approach make things difficult to understand in production, for example, we may have a lot of unclosed writers which consumes a lot memory and leading to oom.

Good point!

I have some rough idea to further improve this: we can use a config to control the maximum parallelism here:

struct RollingWriter { ... buffer: Vec<DataFileBuilder> } ... while close_handles.len() >= self.max_parallelism() { // wait until some closers complete, and store the data files in a buffer self.buffer.extend(future::select(self.close_handles)) } self.close_handles.push(new_handle);

I have not thought very clearly on how to prevent buffer from eating up the memory as of now, or do we even need it?

Either way I agree this can be completed as a follow-up

I've removed the close_handles from this PR and created an issue to track this potential optimization: #1551

Memory control is a complex topic, and from what I've learned simply add a fix number of in fly doesn't work well when integrated into other systems. I would prefer not to spend too much time on this for now.

liurenjie1024 · 2025-07-24T10:27:37Z

crates/iceberg/src/writer/file_writer/rolling_writer.rs

+
+impl<B: FileWriterBuilder> FileWriter for RollingFileWriter<B> {
+    async fn write(&mut self, input: &RecordBatch) -> Result<()> {
+        let input_size = input.get_array_memory_size();


This is incorrect, usually parquet files written size is much smaller than arrow's in memory array size since parquet will do a lot of compression. The target_size is not for exact control, so it's fine to write file a little larger that this size.

You are right, I added some comment to explain this

No, I mean we should not use this input_size to determine if we should roll, but the writer's current_written_size only.

Hi @liurenjie1024 , after testing with the suggested changes, I found an interesting issue:

tldr: the existing ParquetWriter can only get the correct current_written_size when it's closing and flushing data, not when writing data.

This can cause the following case to fail:

let writer: RollingWriter = ... // should create 1 file // but won't update current_written_size because we won't close the writer in write() writer.write(batch1).await? // if this write should rollover, but since the inner.current_written_size is not updated // it will try to write the data to the same file as the previous batch writer.write(batch2).await?

A more detailed analysis:

ParquetWriter uses ArrowAsyncWriter as its inner writer

ArrowAsyncWriter has async_writer (ArrowRowGroupWriter) and sync_writer (TrackWriter in this case)

ArrowAsyncWriter's sync_writer will buffer rows based on the config value max_row_group_size (default is 1024 x 1024), causing TrackWriter won't be able to track the data in the buffer until closing

Basically this issue can happen a lot when the max_row_group_size is large and the target_file_size is small.

To fix this, I think we'll need to change the ParquetWriter's implementation of current_file_size() and use AsyncArrowWrite's in_progress_size to take buffered data into account. But again, in_progress_size is the in-memory size, not the physical size

To fix this, I think we'll need to change the ParquetWriter's implementation of current_file_size() and use AsyncArrowWrite's in_progress_size to take buffered data into account. But again, in_progress_size is the in-memory size, not the physical size

This sounds reasonable to me. According to the doc , in_progress_size + bytes_written seems a better estimation of the current file size. Due to the complex encoding of parquet, it's hard to get accurate file size before finishing one row group, so an estimation is good enough.

I've created an issue and will fix the parquet writer behavior in a separate PR

CTTY · 2025-07-28T22:14:01Z

I've manually tested the rolling writer using some small data with the ParquetWriter::current_written_size fix. Now the generated file size is much closer to the configured target_file_size.

The difference between the configured target_file_size and the actual written file size can vary depending on the batch size. Generally speaking, rolling file writer rolls more precisely when the target file size is much larger compared to the size of each batch, which is expected.

target_file_size	batch_rows	batch_size_in_memory (Bytes)	Written File Size (Bytes)
10 MB	500K	8194584	12968410
1 MB	50K	662424	1077332
300 KB	10K	145816	312345
30 KB	1K	16440	29595
30 KB	500	8360	28800
30 KB	100	1576	28218

Co-authored-by: Renjie Liu <[email protected]>

yingjianwu98 · 2025-07-29T01:48:00Z

Wondering what's your plan to make RollingFileWriter partition aware.
Looks like the parquet writer right now also is not partition aware?

CTTY · 2025-07-29T02:49:37Z

Hi @stevie9868 , I hope my reply in a different thread can answer your question

liurenjie1024 · 2025-07-29T09:18:04Z

I've manually tested the rolling writer using some small data with the ParquetWriter::current_written_size fix. Now the generated file size is much closer to the configured target_file_size.

The difference between the configured target_file_size and the actual written file size can vary depending on the batch size. Generally speaking, rolling file writer rolls more precisely when the target file size is much larger compared to the size of each batch, which is expected.

target_file_size batch_rows batch_size_in_memory (Bytes) Written File Size (Bytes)
10 MB 500K 8194584 12968410
1 MB 50K 662424 1077332
300 KB 10K 145816 312345
30 KB 1K 16440 29595
30 KB 500 8360 28800
30 KB 100 1576 28218

Thanks @CTTY for the tests. The difference are acceptable to me.

liurenjie1024

Thanks @CTTY for this pr!

liurenjie1024 reviewed Jul 24, 2025

View reviewed changes

This was referenced Jul 24, 2025

Discussion: Optimize RollingFileWriter by closing asynchronously #1551

Open

ParquetWriter should account for the buffer of the inner writer when checking current_written_size #1554

Closed

CTTY added 3 commits July 27, 2025 21:29

Fix current_written_size for ParquetWriter

4403eb3

minor

fe4b9be

Trigger Build

f394c22

CTTY and others added 13 commits July 28, 2025 15:43

rolling

14f8b67

rolling in the deep

67300da

rolls the unit tests

3033cab

could have it all for tests

02c1acc

keep rolling with the FileWriter

50a903d

clean roll

93790af

Update crates/iceberg/src/writer/file_writer/rolling_writer.rs

732da14

Co-authored-by: Renjie Liu <[email protected]>

better naming

75d42fa

explain the size

8223830

remove close_handles for now

1c4293a

syntax fix

5346e69

more syntax fix

46a03ab

conservatively rolling

ac22f27

CTTY force-pushed the ctty/rolling-writer branch from ed6b0eb to ac22f27 Compare July 28, 2025 22:46

CTTY added 3 commits July 28, 2025 15:47

should roll should be private

37bf054

Merge branch 'main' into ctty/rolling-writer

31483b5

fix ut

5b84d8a

CTTY added 2 commits July 28, 2025 19:06

Merge branch 'main' into ctty/rolling-writer

3094a8c

Merge branch 'main' into ctty/rolling-writer

596a1ef

Merge branch 'main' into ctty/rolling-writer

ba02cc6

liurenjie1024 approved these changes Jul 29, 2025

View reviewed changes

liurenjie1024 merged commit 3ab45ee into apache:main Jul 29, 2025
17 checks passed

CTTY deleted the ctty/rolling-writer branch July 29, 2025 17:26

feat(core): Implement RollingFileWriter to help split data into multiple files #1547

feat(core): Implement RollingFileWriter to help split data into multiple files #1547

Uh oh!

Conversation

CTTY commented Jul 23, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CTTY Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CTTY Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CTTY commented Jul 28, 2025

Uh oh!

yingjianwu98 commented Jul 29, 2025

Uh oh!

CTTY commented Jul 29, 2025

Uh oh!

liurenjie1024 commented Jul 29, 2025

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(core): Implement `RollingFileWriter` to help split data into multiple files #1547

feat(core): Implement `RollingFileWriter` to help split data into multiple files #1547

CTTY Jul 24, 2025 •

edited

Loading

CTTY Jul 25, 2025 •

edited

Loading