Skip to content

[geometry] Wire up arrow reader/writer for GEOMETRY and GEOGRAPHY #8717

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Parquet now has the geometry and geography type - see Geospatial.md and the rust reader can read/write them.

However at the moment the arrow reader/writer in the parquet crate returns them as binary columns (I think)

@paleolimbot says in apache/parquet-site#123 (comment):

In arrow-rs you currently have to try pretty hard to actually get the types to be written or read but it's tested and can be done.

Describe the solution you'd like

I would like a better experience of reading / writing geospatial types with arrow-rs and the arrow-reader

Describe alternatives you've considered

I think the @kylebarron has some version of this API in the geoparquet crate https://crates.io/crates/geoparquet

However, I am not sure what the

For inspiration, I would suggest an API similar to the support for Variant:

https://docs.rs/parquet/latest/parquet/variant/index.html#example-writing-a-parquet-file-with-variant-column

In that case we defined VariantArray that wraps a StructArray and implements all the Variant details, and then have a way to write that to/from the parquet file with the appropriate metadata. For example:

 // Get a column with Geospatial type:
let GeospatialArray: array = ....;
 let field = array.field("data");
 let array = ArrayRef::from(array);
 // create a RecordBatch with the VariantArray
 let schema = Schema::new(vec![field]);
 let batch = RecordBatch::try_new(Arc::new(schema), vec![array])?;

 // Now you can write the RecordBatch to the Parquet file, as normal
 let file = std::fs::File::create("variant.parquet")?;
 let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?;
 writer.write(&batch)?;
 writer.close()?;

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions