Skip to content

ParquetWriter should account for the buffer of the inner writer when checking current_written_size #1554

@CTTY

Description

@CTTY

Apache Iceberg Rust version

None

Describe the bug

The existing implementation of ParquetWriter::current_written_size (link) won't be accurate because it does not take its inner writer's buffer into account, and its self.written_size will only be accurate when closing the parquet writer. A more detailed analysis can be found here

To Reproduce

No response

Expected behavior

We should use inner.bytes_written + inner.in_progress_size to get an estimate size for the current_written_size

Willingness to contribute

I can contribute a fix for this bug independently

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions