Fix ArrowReaderOptions should read with physical_file_schema so we do… #17

zhuqi-lucas · 2025-04-04T15:25:38Z

…n't need to cast back to utf8

Which issue does this PR close?

Make the benchmark back to normal.

Rationale for this change

Fix ArrowReaderOptions should read with physical_file_schema

Make the benchmark back to normal.

What changes are included in this PR?

Fix ArrowReaderOptions should read with physical_file_schema
Make the benchmark back to normal.

Are these changes tested?

Yes

Are there any user-facing changes?

No

…n't need to cast back to utf8

alamb · 2025-04-04T15:42:50Z

datafusion/datasource-parquet/src/opener.rs

-                    ArrowReaderOptions::new().with_page_index(true),
+                    ArrowReaderOptions::new()
+                        .with_page_index(true)
+                        .with_schema(physical_file_schema.clone()),


this is a nice catch

… ParquetOpener (apache#15561) * parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener * use file schema, avoid loading page index if unecessary * Add comment * add comment * Add comment * remove check * fix clippy * update sqllogictest * restore to explain plans * reverted * modify access * Fix ArrowReaderOptions should read with physical_file_schema so we do… (#17) * Fix ArrowReaderOptions should read with physical_file_schema so we don't need to cast back to utf8 * Fix fmt * Update opener.rs * Always apply per-file schema during parquet read (#18) * Update datafusion/datasource-parquet/src/opener.rs --------- Co-authored-by: Qi Zhu <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>

Fix ArrowReaderOptions should read with physical_file_schema so we do…

5179e38

…n't need to cast back to utf8

github-actions bot added the datasource label Apr 4, 2025

zhuqi-lucas mentioned this pull request Apr 4, 2025

parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener apache/datafusion#15561

Merged

Fix fmt

d30bf38

alamb reviewed Apr 4, 2025

View reviewed changes

adriangb merged commit cd6d766 into pydantic:move-predicate Apr 4, 2025
1 check passed

alamb mentioned this pull request Apr 4, 2025

Always apply per-file schema during parquet read #18

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ArrowReaderOptions should read with physical_file_schema so we do… #17

Fix ArrowReaderOptions should read with physical_file_schema so we do… #17

Uh oh!

zhuqi-lucas commented Apr 4, 2025

Uh oh!

alamb Apr 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix ArrowReaderOptions should read with physical_file_schema so we do… #17

Fix ArrowReaderOptions should read with physical_file_schema so we do… #17

Uh oh!

Conversation

zhuqi-lucas commented Apr 4, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants