-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is part of the larger project to implement StringViewArray -- see #5374
In #5481 we added support for StringViewArray and ByteViewArray.
The parquet crate has a reader and writer for reading/writing parquet data to arrow:
- reader: https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/index.html
- writer: https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/index.html
Describe the solution you'd like
I would like to be able to read a StringViewArray and BinaryViewArray directly from the reader and writer with no data copies (so the raw byte values are not copied).
- Add functionality
- Add tests
Describe alternatives you've considered
For example, I think we need to add the support to the writer here
arrow-rs/parquet/src/arrow/arrow_writer/mod.rs
Lines 719 to 732 in f41c2a4
| ArrowDataType::Dictionary(_, value_type) => match value_type.as_ref() { | |
| ArrowDataType::Utf8 | ArrowDataType::LargeUtf8 | ArrowDataType::Binary | ArrowDataType::LargeBinary => { | |
| out.push(bytes(leaves.next().unwrap())) | |
| } | |
| _ => { | |
| out.push(col(leaves.next().unwrap())) | |
| } | |
| } | |
| _ => return Err(ParquetError::NYI( | |
| format!( | |
| "Attempting to write an Arrow type {data_type:?} to parquet that is not yet implemented" | |
| ) | |
| )) | |
| } |
Additional context
The reader/writer already handles DictionaryArrays which I think could serve as a model for the view arrays.
@ariesdevil reports they are working on this feature #5374 (comment)