When using ParquetWriter to write RecordBatch and it's counting nan values, it will need to walk through both RecordBatch's schema and Iceberg schema in a partner fashion:
|
.compute(self.schema.clone(), batch_c)?; |
Basically the call stack is NanValueCountVisitor::compute -> visit_struct_with_partner -> ArrowArrayAccessor::field_partner -> get_field_id
This will fail in get_field_id when the incoming arrow schema doesn't have PARQUET:field_id, which is the case for inserting data using DataFusion.
Originally posted by @CTTY in #1511 (comment)