Skip to content

Conversation

@TomAugspurger
Copy link

This adds a "pc" fsspec filesystem implementation, which lets us
insert "pc::" in an fsspec URL and automatically sign it when loading
it with an fsspec client.

The primary motivation is integration with fsspec's filesystem where
users would need to call planetary_computer.sign in multiple places

  1. Once for loading the index JSON files
  2. Once for signing the reference filesystem templates

Which lets us replace this:

>>> result = xr.open_dataset(
...     fsspec.get_mapper(
...         "reference://",
...         fo=planetary_computer.sign(requests.get(planetary_computer.sign("https://deltaresreservoirssa.blob.core.windows.net/references/reservoirs/chirps.json")).json()),
...     ),
...     engine="zarr",
...     consolidated=False,
... )

With this:

>>> result = xr.open_dataset(
...     "pc::reference::pc::https://deltaresreservoirssa.blob.core.windows.net/references/reservoirs/CHIRPS.json",
...     engine="zarr",
...     consolidated=False,
... )

Still just a POC. I need to figure out

  1. Better tests.
  2. If there's a way to modifier the references earlier.

This adds a "pc" fsspec filesystem implementation, which lets us
insert "pc::" in an fsspec URL and automatically sign it when loading
it with an fsspec client.

The primary motivation is integration with fsspec's filesystem where
users would need to call `planetary_computer.sign` in multiple places

1. Once for loading the index JSON files
2. Once for signing the reference filesystem templates

Which lets us replace this:

```python
>>> result = xr.open_dataset(
...     fsspec.get_mapper(
...         "reference://",
...         fo=planetary_computer.sign(requests.get(planetary_computer.sign("https://deltaresreservoirssa.blob.core.windows.net/references/reservoirs/chirps.json")).json()),
...     ),
...     engine="zarr",
...     consolidated=False,
... )
```

With this:

```python
>>> result = xr.open_dataset(
...     "pc::reference::pc::https://deltaresreservoirssa.blob.core.windows.net/references/reservoirs/CHIRPS.json",
...     engine="zarr",
...     consolidated=False,
... )
```
fo = planetary_computer.sign(fo)
self.fo = fo
self.target_fs = fsspec.filesystem(self.target_protocol, **self.target_options)
if isinstance(self.target_fs, fsspec.implementations.reference.ReferenceFileSystem):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this block.

The reference filesystem has the idea of "template" URLs, which are the NetCDF files in blob storage. We want to sign those URLs before anyone attempts to access data via this reference filesystem.

It seems that the reference filesystem's __init__ calls a method at https://github.com/fsspec/filesystem_spec/blob/7effb83e8ab31010ec5796c14193b5fcd5774e05/fsspec/implementations/reference.py#L149, which does a lot of work including in-lining the template URLs in the reference (url, start, end) tuples. Unfortunately, we don't have a way to update the template URLs before the tuples are built, so we have to do it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant