Skip to content

Distributed open_rasterio read error when URL contains permissions #3489

@system123

Description

@system123

MCVE Code Sample

I have a GeoTiff which is stored in an S3 bucket and accessible via a URL which contains authentication parameters. When opening the file with xarray.open_rasterio I am able to read the file's metadata and perform computations as expected. However, if I try and run these computations across a Dask LocalCluster or KubeCluster I can only read the metadata, the computations fail with a 403 error.

from dask.distributed import Client, LocalCluster
import xarray as xr

url = "https://dataset.s3.us-west-2.amazonaws.com/mosaic-dir/SAR-mosaic.tif?AWSAccessKeyId=XXXXXXXX&Expires=1573079289&Signature=XXXXXXXXXX&x-amz-security-token=XXXXX....."

client = Client(LocalCluster())
ds = xr.open_rasterio(url, chunks=5000)
ds.mean().compute()

Output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    197             try:
--> 198                 file = self._cache[self._key]
    199             except KeyError:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key)
     52         with self._lock:
---> 53             value = self._cache[key]
     54             self._cache.move_to_end(key)

KeyError: [<function open at 0x7f18da7ba488>, ('MY URL',), 'r', ()]

During handling of the above exception, another exception occurred:

CPLE_HttpResponseError                    Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

rasterio/_shim.pyx in rasterio._shim.open_dataset()

rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()

CPLE_HttpResponseError: HTTP response code: 403

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
<ipython-input-2-ee58a2575c0a> in <module>
      1 client = Client(LocalCluster())
      2 url = "MY URL"
----> 3 ds = xr.open_rasterio(url, chunks=5000)
      4 ds.mean().compute()

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/rasterio_.py in open_rasterio(filename, parse_coordinates, chunks, cache, lock)
    237 
    238     manager = CachingFileManager(rasterio.open, filename, lock=lock, mode="r")
--> 239     riods = manager.acquire()
    240     if vrt_params is not None:
    241         riods = WarpedVRT(riods, **vrt_params)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/file_manager.py in acquire(self, needs_lock)
    178         An open file object, as returned by ``opener(*args, **kwargs)``.
    179         """
--> 180         file, _ = self._acquire_with_cache_info(needs_lock)
    181         return file
    182 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    202                     kwargs = kwargs.copy()
    203                     kwargs["mode"] = self._mode
--> 204                 file = self._opener(*self._args, **kwargs)
    205                 if self._mode == "w":
    206                     # ensure file doesn't get overriden when opened again

/srv/conda/envs/notebook/lib/python3.7/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    443 
    444         with env_ctor(session=session):
--> 445             return f(*args, **kwds)
    446 
    447     return wrapper

/srv/conda/envs/notebook/lib/python3.7/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    214         # None.
    215         if mode == 'r':
--> 216             s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
    217         elif mode == 'r+':
    218             s = get_writer_for_path(path)(path, mode, driver=driver, sharing=sharing, **kwargs)

rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

RasterioIOError: HTTP response code: 403

Expected Output

<xarray.DataArray 'Band1' ()>
array(2681.77006093)
Coordinates:
    pol      <U2 'VV'

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.4.0-1079-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2

xarray: 0.14.0
pandas: 0.25.2
numpy: 1.17.3
scipy: 1.3.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: 2.3.2
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.25
cfgrib: None
iris: 2.2.0
bottleneck: 1.2.1
dask: 2.6.0
distributed: 2.6.0
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.6.0.post20191029
pip: 19.3.1
conda: None
pytest: None
IPython: 7.9.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions