Skip to content

basic support for v2 and v3 groups #1590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Dec 5, 2023

This begins the work of harmonizing the V2 and V3 group APIs from Zarrita with the Zarr-Python Group API.

I'll leave some comments inline.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@pep8speaks
Copy link

pep8speaks commented Dec 5, 2023

Hello @jhamman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 30:1: W293 blank line contains whitespace

Comment last updated at 2023-12-06 15:13:06 UTC

Copy link
Member Author

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts below 👇



@frozen
class GroupMetadata:
attributes: Dict[str, Any] = field(factory=dict)
zarr_format: Literal[3] = 3
node_type: Literal["group"] = "group"
zarr_format: Literal[2, 3] = 3 # field(default=3, validator=validators.in_([2, 3]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanrz - I tried this and failed! Happy to return to it later.


@abstractmethod
def __setitem__(self, key: str, value: Union[SyncArray, "SyncGroup"]) -> None:
"""get child"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""get child"""

Proposing that we do away with Group.__setitem__ all together. It really doesn't make sense to include in the api.

@abstractmethod
def group_keys(self) -> AsyncIterator[str]:
"""iterate over child group keys"""
...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these iterators need listable stores before they can be built.

@frozen
class SyncConfiguration:
concurrency: Optional[int] = None
asyncio_loop: Optional[AbstractEventLoop] = None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a half hearted attempt at rethinking the config topic. I expect this to go away soon.

zarr/v3/group.py Outdated
Comment on lines 317 to 334
def __setitem__(self, key, value):
raise NotImplementedError
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __setitem__(self, key, value):
raise NotImplementedError

zarr/v3/group.py Outdated
return self._sync(self._async_group.nchildren)

@property
def children(self) -> List[Array, "Group"]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some exploratory work is needed to figure out how to make this (and similar methods) a proper iterator/generator instead of a list. The tricky bit is that the async method is returning an AsyncIterator and we need to synchronize that into a regular Iterator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that there is only 1 Group and AsyncGroup. Do we actually want to keep the Group ABC?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-v-b and I were just talking about that last night. I think the answer is NO. For ABCs, I we probably only need the Store and Codec base classes.

# TODO: consider trying to autodiscover the zarr-format here
if zarr_format == 3:
# V3 groups are comprised of a zarr.json object
# (it is optional in the case of implicit groups)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are implicit groups really a thing? I don't think they show up in the v3 spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do show up in the spec (in quite a few places actually)! https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#explicit-vs-implicit-groups

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that!

if self.zarr_format == 3:
zarr_json_bytes = await (store_path / ZARR_JSON).get_async()
if zarr_json_bytes is None:
# implicit group?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit dangerous because we have no good way of telling whether this path even exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and we should return to this. For now I'm going to put a log statement there and we can address this later.

store_path, zarr_json, runtime_configuration=self.runtime_configuration
)
elif self.zarr_format == 2:
# Q: how do we like optimistically fetching .zgroup, .zarray, and .zattrs?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overfetching is fine because it is happening in parallel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this issue the other day. One possible behavior when you "open" anything in Zarr would be to call the store.tree(depth=1) method, fetching all existing metadata docs at the root level and one level below.

- removed abcs for groups/arrays
- improved return types in group.py
- warn (temporarily) when an implicit group is found
- add attributes.py with Attributes class
@jhamman jhamman force-pushed the basic-v2-and-v3-group-support branch from 888551a to 84b497f Compare December 6, 2023 15:24
@jhamman jhamman marked this pull request as ready for review December 6, 2023 16:52
elif self.zarr_format == 2:
return {
ZGROUP_JSON: self.zarr_format,
ZATTRS_JSON: json.dumps(self.attributes).encode(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would custom JSON serialization of user attributes plug in here? e.g., i don't think numpy dtype objects serialize to JSON easily. Maybe to_bytes can take a attrs_serializer kwarg or something.

def create_array(self, name: str, **kwargs) -> Array:
return Array(self._sync(self._async_group.create_array(name, **kwargs)))

def empty(self, **kwargs) -> Array:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on ultimately putting these routines in a dedicated namespace?

try:
if self.store == other.store and self.path == other.path:
return True
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably be a bit more specific about the exception type here

@jhamman jhamman closed this Feb 7, 2024
@jhamman jhamman mentioned this pull request Mar 30, 2024
6 tasks
@jhamman jhamman added this to the 3.0.0.alpha milestone Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants