Skip to content

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Dec 19, 2023

While working on the write support, I started to understand the meaning of the Arrow metadata. The PARQUET: prefix means that it is specific to Parquet, and setting PARQUET:field_id will actually not be set as metadata, but as the field_id. I think we should remove the other option without the prefix.

I also did the same for the doc. I checked, in Java we don't check for the documentation, but I think it is a nice addition. However, I think we should only check for the key that's the same as the Iceberg field (doc).

While working on the write support, I started to understand the
meaning of the Arrow metadata. The `PARQUET:` prefix means that
it is specific to Parquet, and setting `PARQUET:field_id` will
actually not be set as metadata, but as the `field_id`. I think
we should remove the other option without the prefix.

I also did the same for the doc. I checked, in Java we don't check
for the documentation, but I think it is a nice addition. However,
I think we should only check for the key that's the same as the
Iceberg field (`doc`).
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Fokko for the fix and @jqin61 for flagging this! This really clears up my confusion about how Arrow manages its field metadata. I have some comments below.

Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just have one small comment

@HonahX HonahX mentioned this pull request Jan 6, 2024
@HonahX HonahX merged commit 4593208 into apache:main Jan 12, 2024
@Fokko Fokko deleted the fd-set-field-id-with-prefix branch January 12, 2024 22:44
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Jan 13, 2024
'PARQUET:' prefix is specific to Parquet, with 'PARQUET:field_id' setting the 'field_id'. Removed the non-prefixed alternative for 'field_id'. Removed the prefixed alternative for 'doc'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants