Skip to content

Integrate with rioxarray and implement GMT_IMAGE to read RGB GeoTIFFs #1555

@weiji14

Description

@weiji14

Description of the desired feature

This is a parallel issue to #578 and #1442. While researching how to read GeoTIFFs into an xarray.DataArray, specifically GeoTIFFs with RGB bands stored in a single band but with a color lookup table, I discovered that there wasn't really a straightforward solution 😅 So this issue is mainly to jot down some notes.

1) Reading GeoTiff into xarray.DataArray

Method A: rioxarray.open_rasterio

Pros: Reads in data with proper dtype (e.g. uint8) and proper spatial_ref
Cons: rioxarray is a bit of a heavy dependency as it requires scipy (edit: may be resolved with corteva/rioxarray#413, edit2: rioxarray=0.8.0 has made scipy optional!)

import pygmt
import rioxarray

filename_or_obj = pygmt.which(fname="@earth_day_01d", download="a")

with rioxarray.open_rasterio(filename_or_obj) as dataarray:
    print(dataarray)

gives

<xarray.DataArray (band: 1, y: 180, x: 360)>
array([[[251, 251, ..., 251, 251],
        [251, 252, ..., 251, 252],
        ...,
        [ 68,  16, ...,  68, 217],
        [149, 149, ..., 149,  16]]], dtype=uint8)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * y            (y) float64 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
    spatial_ref  int64 0
Attributes:
    scale_factor:  1.0
    add_offset:    0.0

Method B: xr.open_dataarray(..., engine="rasterio")

Pros: Requires only rasterio (a Pythonic GDAL wrapper) to be installed, which keeps things lightweight Easier to put into pygmt.load_dataarray as it only requires the 'engine' argument
Cons: Seems to read in data as float32 by default

Note: There was an xr.open_rasterio function in xarray v0.19.0, but it is pending deprecation, see pydata/xarray#4697, and removal PR at pydata/xarray#5808.

with xr.open_dataarray(filename_or_obj) as dataarray:
    print(dataarray)

produces

<xarray.DataArray 'band_data' (band: 1, y: 180, x: 360)>
array([[[251., 251., ..., 251., 251.],
        [251., 252., ..., 251., 252.],
        ...,
        [ 68.,  16., ...,  68., 217.],
        [149., 149., ..., 149.,  16.]]], dtype=float32)
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * y            (y) float64 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
    spatial_ref  int64 ...

TLDR My preference is to use (A) rioxarray.open_rasterio because it reads in data properly as uint8, with a fallback to (B) xr.open_dataarray(..., engine=rasterio) if only rasterio is installed.

2) Reading RGB colors from a GeoTIFF

Here's where things get a bit tricky. Neither rioxarray.open_rasterio nor xarray.open_dataarray reads in GeoTIFFs with color interpretation, i.e. RGB colors stored in a GeoTIFF will be read as a single band rather than 3 separate bands! This includes the @earth_day_01d and @earth_night_01d imagery. To be clear, non-color GeoTIFFs (e.g. DEMs, single band Landsat/Sentinel-2 images) are fine.

Good news is that the color interpretation mechanism is present in rasterio see https://github.com/mapbox/rasterio/blob/1.2.8/docs/topics/color.rst and rasterio/rasterio#100. This is wrapping around the GDAL Color Table stuff (https://gdal.org/user/raster_data_model.html#color-table)

Bad news is that we'll need to have that mechanism in rioxarray or xarray in order to use it (unless we want to go with a pure rasterio route). There is an issue open though at corteva/rioxarray#191, so it might be a matter of helping to open a Pull Request in those libraries to implement the feature.

3) Implementing GMT_IMAGE in PyGMT

So once we've solved (1) and (2) above, the next step would be, how to pass that 3-band RGB color xarray.DataArray into GMT? Max mentioned at #578 (comment) that we might need to use the GMT_IMAGE resource, but that hasn't been added into pygmt.clib as of PyGMT v0.4.1.

Probably need something like a virtualfile_from_image function, similar to virtualfile_from_grid (done in #159) at https://github.com/GenericMappingTools/pygmt/blame/v0.4.1/pygmt/clib/session.py#L1278

Are you willing to help implement and maintain this feature? Something to do for PyGMT v0.6.0 I think.

The steps I think would be to:

References:

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions