Skip to content

Support chunking multiple assets together in the time/band dimensions #106

@gjoseph92

Description

@gjoseph92

Currently, stackstac is built around each STAC Asset being its own chunk in the dask array—the time and band dimensions always have a chunksize of 1.

However, there are cases where you might want to load multiple Assets in one chunk of the array. Most commonly, you'd do this when you have a huge graph, need to cut down on tasks, and can give up some granularity. Particularly, you might be happy to combine the time dimension into fewer chunks if you know you're doing a composite right away anyway. See microsoft/PlanetaryComputer#12 (comment) for a motivating example.

So let's support extending the chunksize= argument to stackstac.stack to take up to 4-tuples (time, band, y, x), so you can specify the chunking along all dimensions.

Note that this isn't #66 (though that could be a follow-on): we're not talking about flattening/pre-mosaicing the data. We'd still load every asset as usual, it's just that the chunks of the dask array might be (4, 2, Y, X) instead of always (1, 1, Y, X).

This should be done/considered as a part of #105.

Questions:

  • When a chunk contains multiple assets, should they be loaded serially, or in parallel? We could create our own internal threadpool, since most of the IO is not CPU-bound. However, because we have to duplicate the GDAL Dataset and file-descriptor per-thread, that might be expensive on memory. I suppose the runtime of T threads reading N assets is the same as T threads reading N / C assets, where each read takes C times longer. So probably in serial. Sure would be nice to just have an aiocogeo Reader for this 😁
  • How will combining multiple bands into a single chunk interplay with Support multi-band COGs #62?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions