Skip to content

Add support for GEOMETRY and GEOGRAPHY types in Parquet read and/or write #7240

@paleolimbot

Description

@paleolimbot

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I'd like to be able to read and/or write Parquet files with the new GEOMETRY and GEOGRAPHY types!

Describe the solution you'd like

Support for read and/or write (perhaps read first and then write).

Describe alternatives you've considered

Additional context

I think the main issue is what Arrow type to read into. The Parquet types have type-level metadata (a coordinate reference system and edge interpolation for geography) which can be propagated via the geoarrow.wkb extension type ( https://github.com/geoarrow/geoarrow/blob/main/extension-types.md#extension-metadata ). The most complicated mapping scenario looks something like:

Parquet: GEOGRAPHY(crs=projjson:some_file_metadata_field, algorithm=spherical) -> Arrow: geoarrow.wkb + {"crs": {<the actual projjson>}, "edges": "spherical"}

(The fact that the Parquet spec "recommends" putting the actual PROJJSON into the file metadata is something I tried to discourage when negotiating the spec change but was not ultimately successful).

I haven't looked at the existing type mapping code but I think I remember reading the recent ExtensionType change was followed up with the ability for field metadata to be inspected/generated on the way in/out of Parquet to ensure that metadata is propagated wherever possible.

Right now GeoArrow extension types are listed as "community extension types", which I believe was a category made up just for us. It may be that moving/voting geoarrow.wkb to the "canonical extension type" category is a precursor to finalizing this implementation, which is definitely fair 🙂 .

I'm happy to attempt this when I get a chance (unless @kylebarron is chomping at the bit to do it or has already done it!).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions