Skip to content

Conversation

@sungwy
Copy link
Collaborator

@sungwy sungwy commented Mar 9, 2024

Similar to how the partition fields are already initialized with required=False in partition_type, we should set required=False in the manifest_entry schema within the ManifestFile.

This PR introduces this fix first from @jqin61 's WIP PR, so other PRs that add partition values to manifest files can be unblocked.

Without this change, we cannot serialize None partition values into the ManifestEntry in the avro file.

Below WIP PRs require this change
Partitioned Write: #353
Add Files: #506

@sungwy sungwy requested a review from Fokko March 9, 2024 22:52
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

field_id=field.field_id,
name=field.name,
field_type=partition_field_to_data_file_partition_field(field.field_type),
required=False,
Copy link
Contributor

@Fokko Fokko Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of carrying this over from the field?

Suggested change
required=False,
required=field.required,

I would prefer this because Avro schema's are translated to Iceberg, and then used in reading the files. In Avro reading an optional field is different than reading a required field (in the case of an optional field, it will first read a boolean checking if the value is there).

I checked locally in tests, and there the fields are not required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @Fokko . I've made the corresponding change to partition_type to carry this over from the partition field as suggested 👍

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion, apart from that, this looks great! Thanks for catching this @syun64 and @jqin61 and splitting it out 👍

@Fokko Fokko merged commit 0ab3262 into apache:main Mar 11, 2024
@sungwy sungwy deleted the data-file-partition-nullable branch March 11, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants