-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Parquet now has the geometry and geography type - see Geospatial.md and the rust reader can read/write them.
However at the moment the arrow reader/writer in the parquet crate returns them as binary columns (I think)
@paleolimbot says in apache/parquet-site#123 (comment):
In arrow-rs you currently have to try pretty hard to actually get the types to be written or read but it's tested and can be done.
Describe the solution you'd like
I would like a better experience of reading / writing geospatial types with arrow-rs and the arrow-reader
Describe alternatives you've considered
I think the @kylebarron has some version of this API in the geoparquet crate https://crates.io/crates/geoparquet
However, I am not sure what the
For inspiration, I would suggest an API similar to the support for Variant:
In that case we defined VariantArray that wraps a StructArray and implements all the Variant details, and then have a way to write that to/from the parquet file with the appropriate metadata. For example:
// Get a column with Geospatial type:
let GeospatialArray: array = ....;
let field = array.field("data");
let array = ArrayRef::from(array);
// create a RecordBatch with the VariantArray
let schema = Schema::new(vec![field]);
let batch = RecordBatch::try_new(Arc::new(schema), vec![array])?;
// Now you can write the RecordBatch to the Parquet file, as normal
let file = std::fs::File::create("variant.parquet")?;
let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?;
writer.write(&batch)?;
writer.close()?;Additional context