-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I'd like to be able to read and/or write Parquet files with the new GEOMETRY and GEOGRAPHY types!
- Spec references: https://github.com/apache/parquet-format/blob/master/Geospatial.md + https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L240-L261
- C++ implementation PR: GH-45522: [Parquet][C++] Parquet GEOMETRY and GEOGRAPHY logical type implementations arrow#45459
- Java implementation PR: PARQUET-2417: Add statistics support to geometry logical type parquet-java#2971
- Test files: Example files for GEOMETRY and GEOGRAPHY logical type parquet-testing#70 (and a few bigger ones at https://github.com/geoarrow/geoarrow-data )
Describe the solution you'd like
Support for read and/or write (perhaps read first and then write).
Describe alternatives you've considered
Additional context
I think the main issue is what Arrow type to read into. The Parquet types have type-level metadata (a coordinate reference system and edge interpolation for geography) which can be propagated via the geoarrow.wkb extension type ( https://github.com/geoarrow/geoarrow/blob/main/extension-types.md#extension-metadata ). The most complicated mapping scenario looks something like:
Parquet: GEOGRAPHY(crs=projjson:some_file_metadata_field, algorithm=spherical) -> Arrow: geoarrow.wkb + {"crs": {<the actual projjson>}, "edges": "spherical"}
(The fact that the Parquet spec "recommends" putting the actual PROJJSON into the file metadata is something I tried to discourage when negotiating the spec change but was not ultimately successful).
I haven't looked at the existing type mapping code but I think I remember reading the recent ExtensionType change was followed up with the ability for field metadata to be inspected/generated on the way in/out of Parquet to ensure that metadata is propagated wherever possible.
Right now GeoArrow extension types are listed as "community extension types", which I believe was a category made up just for us. It may be that moving/voting geoarrow.wkb to the "canonical extension type" category is a precursor to finalizing this implementation, which is definitely fair 🙂 .
I'm happy to attempt this when I get a chance (unless @kylebarron is chomping at the bit to do it or has already done it!).