Skip to content

Conversation

@ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Dec 13, 2024

This PR exposes the _serde::DataFile so that the user can serialize && deserialize the data file. related issue: #774

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Dec 13, 2024

cc @liurenjie1024 @Xuanwo @Fokko @sdd

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Dec 18, 2024

I change this PR to add interface to help serialize/deserialize DataFile into avro bytes. The idea comes from #774 (comment).

I think it can be a good start for #774. It provides the interface to let user serialize/deserialize the DataFile.

In later, we can discuss whether to let DataFile itself serializable and essentially, it means that we should contain more info in the DataFile and we don't need to provide the info(e.g. partition type) as parameters in the interface.

@ZENOTME ZENOTME changed the title feat: expose _serde::DataFile feat: support serialize/deserialize DataFile into avro bytes Dec 19, 2024
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZENOTME for this pr. Left some suggestions to improve api consistency, others LGTM.

})
};

fn data_file_fields_v2(partition_type: StructType) -> Vec<NestedFieldRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn data_file_fields_v2(partition_type: StructType) -> Vec<NestedFieldRef> {
fn data_file_fields_v2(partition_type: &StructType) -> Vec<NestedFieldRef> {

]
}

pub(super) fn data_file_schema_v2(partition_type: StructType) -> Result<AvroSchema, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(super) fn data_file_schema_v2(partition_type: StructType) -> Result<AvroSchema, Error> {
pub(super) fn data_file_schema_v2(partition_type: &StructType) -> Result<AvroSchema, Error> {

schema_to_avro_schema("manifest_entry", &schema)
}

fn data_file_fields_v1(partition_type: StructType) -> Vec<NestedFieldRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn data_file_fields_v1(partition_type: StructType) -> Vec<NestedFieldRef> {
fn data_file_fields_v1(partition_type: &StructType) -> Vec<NestedFieldRef> {

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small nit.

@ZENOTME ZENOTME requested a review from Xuanwo January 2, 2025 05:41
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ZENOTME for working this!

@Xuanwo Xuanwo merged commit 09fa1fa into apache:main Jan 2, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants