-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Is your feature request related to a problem or challenge?
Part of #11752
While working to enable StringView in #12092 I found that the columns when read as StringView and BinaryView do not take advantage of Bloom filters.
Specifically this code doesn't handle StringView
datafusion/datafusion/core/src/datasource/physical_plan/parquet/row_group_filter.rs
Lines 267 to 272 in a08f923
| ScalarValue::Utf8(Some(v)) => sbbf.check(&v.as_str()), | |
| ScalarValue::Binary(Some(v)) => sbbf.check(v), | |
| ScalarValue::FixedSizeBinary(_size, Some(v)) => sbbf.check(v), | |
| ScalarValue::Boolean(Some(v)) => sbbf.check(v), | |
| ScalarValue::Float64(Some(v)) => sbbf.check(v), | |
| ScalarValue::Float32(Some(v)) => sbbf.check(v), |
Describe the solution you'd like
Support applying parquet bloom filters to StringView columns
Describe alternatives you've considered
Basically:
- Make the code changes for bloom filters in Enable reading
StringViewArrayby default from Parquet #12092 - Write a test
In terms of testing, I think the easiest thing to do would be to follow the model of the existing tests for Utf8/Binary columns and pass the schema_force_view_types config flag
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers