Need to use field.name to determine arrow field's position when PARQUET:field_id is unavailable

When using `ParquetWriter` to write `RecordBatch` and it's counting nan values, it will need to walk through both `RecordBatch`'s schema and Iceberg schema in a partner fashion: https://github.com/apache/iceberg-rust/blob/9787140165a15afaf50fc4742484d22c2230bd60/crates/iceberg/src/writer/file_writer/parquet_writer.rs#L528

Basically the call stack is `NanValueCountVisitor::compute` -> `visit_struct_with_partner` -> `ArrowArrayAccessor::field_partner` -> `get_field_id`

This will fail in `get_field_id` when the incoming arrow schema doesn't have `PARQUET:field_id`, which is the case for inserting data using DataFusion.


_Originally posted by @CTTY in https://github.com/apache/iceberg-rust/pull/1511#discussion_r2223962611_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Need to use field.name to determine arrow field's position when PARQUET:field_id is unavailable #1560

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Need to use field.name to determine arrow field's position when PARQUET:field_id is unavailable #1560

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions