Migrating to zarr v3

Currently spike interface uses zarr v2 which does not have a compatible API with zarr v3 which has different features (mainly sharding). 
My motivation behind sharding is that I like being able to load a single channel pretty fast, so I usually put channel_chunk_size fairly low, but by doing so, I get a huge number of files (384x2x3600 for a 2h recording...).

I've tried to migrate the code of zarrextractors.py and here is my conclusion:

- migrating to v3 is mostly easy by changing `group.create_dataset(key=name,..., data=data)` by `group.create_array(key=name, ..., shape=data.shape, dtype=data.dtype)` followed by `group[name][:] = data`
- we have a problem for structured numpy arrays which were handled in v2 and not in v3. These seem in the process of being handled by zarr, see [this pr](https://github.com/zarr-developers/zarr-python/pull/2874). Either we wish to rely on this future implementation or we can have a custom convention for handling them. A solution could be to create a group with a name like "_structuredarraygrp[name]" and put each individual array in it. 
- However, for reading the structured arrays, the code is less obvious: either we remove the lazyness of zarr and we just create the numpy array from the group each time, or we need to find a way to provide a lazyarray that handles structured dtypes. Perhaps by using dask ?
- adding sharding is extremely easy as its just a parameter of create_array. However, one needs to modify the functions that process arguments.

Is there any interest in such a migration (enough for me to submit a pr) ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrating to zarr v3 #4014

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migrating to zarr v3 #4014

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions