-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
This ticket is a follow on to #10918 where we implemented enough initial support for StringView / BinaryView that we can show some pretty sweet ClickBench results
Describe the solution you'd like
This epic tracks remaining work to complete the "initial" work which I would like to define as "enable using StringView when reading Strings from Parquet by default"
I am sure there will be additional work / support to add StringView to various other features of DataFusion that we can maybe track with another follow on ticket
Required for enabling StringView by default:
- Enable reading StringView by default from Parquet (
schema_force_string_view) by default #11682 - Support string concat
||forStringViewArray#11766 - COUNT(DISTINCT) on StringView panics:
unreachable code: Utf8/Binary should use ArrowBytesSet#11767 - Support protobuf serialization for
ScalarValue::Utf8ViewandScalarValue::BinaryView#12117 - Support substrait serialization for
ScalarValue::Utf8ViewandScalarValue::BinaryView#12118 - Convert
Utf8View/BinaryView-->Utf8/Binaryat output #12119 - Parquet statistics missing when reading
Utf8asUtf8View#12123 - Support StringView for binary operators like
~,!~, etc #12180 - Support applying parquet bloom filters to StringView columns #12499
- Support Binary --> String coercion for StringView/BinaryView in
LIKE#12500 - Casting from Binary --> Utf8 to evaluate
LIKEslows down some ClickBench queries #12509
Could work around but really should be fixed upstream
- Support casting
BinaryView-->Utf8andLargeUtf8arrow-rs#6162 - Add support for
StringViewandBinaryViewstatistics inStatisticsConverterarrow-rs#6164 - Utf8View / BinaryView /
StringViewArray::slice()andBinaryViewArray::slice()are slow (they allocate) arrow-rs#6408
Additional "Nice to have" Features
- [Epic] Native
StringViewsupport for string functions #11790 - Reduce copying in
CoalesceBatchesExecfor StringViews #11628 - Cast Utf8 -> Utf8View (not the other way around) for binary operators #11881
- Improve performance of SUBSTR for StringViewArray #12031
- Improve performance of REPEAT functions #12015
- [Epic] Complete Initial
StringViewin DataFusion #11752 - Automate testing / ensuring that string functions get the same answer for String, LargeString, StringView, DictionaryString, etc #12415
2010YOUY01, Rachelint and loloxwg
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request