-
-
Notifications
You must be signed in to change notification settings - Fork 349
basic support for v2 and v3 groups #1590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jhamman! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2023-12-06 15:13:06 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts below 👇
|
||
|
||
@frozen | ||
class GroupMetadata: | ||
attributes: Dict[str, Any] = field(factory=dict) | ||
zarr_format: Literal[3] = 3 | ||
node_type: Literal["group"] = "group" | ||
zarr_format: Literal[2, 3] = 3 # field(default=3, validator=validators.in_([2, 3])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanrz - I tried this and failed! Happy to return to it later.
zarr/v3/abc/group.py
Outdated
|
||
@abstractmethod | ||
def __setitem__(self, key: str, value: Union[SyncArray, "SyncGroup"]) -> None: | ||
"""get child""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""get child""" |
Proposing that we do away with Group.__setitem__
all together. It really doesn't make sense to include in the api.
zarr/v3/abc/group.py
Outdated
@abstractmethod | ||
def group_keys(self) -> AsyncIterator[str]: | ||
"""iterate over child group keys""" | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these iterators need listable stores before they can be built.
@frozen | ||
class SyncConfiguration: | ||
concurrency: Optional[int] = None | ||
asyncio_loop: Optional[AbstractEventLoop] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a half hearted attempt at rethinking the config topic. I expect this to go away soon.
zarr/v3/group.py
Outdated
def __setitem__(self, key, value): | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __setitem__(self, key, value): | |
raise NotImplementedError |
zarr/v3/group.py
Outdated
return self._sync(self._async_group.nchildren) | ||
|
||
@property | ||
def children(self) -> List[Array, "Group"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some exploratory work is needed to figure out how to make this (and similar methods) a proper iterator/generator instead of a list. The tricky bit is that the async method is returning an AsyncIterator and we need to synchronize that into a regular Iterator.
zarr/v3/abc/group.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that there is only 1 Group and AsyncGroup. Do we actually want to keep the Group ABC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-v-b and I were just talking about that last night. I think the answer is NO. For ABCs, I we probably only need the Store and Codec base classes.
# TODO: consider trying to autodiscover the zarr-format here | ||
if zarr_format == 3: | ||
# V3 groups are comprised of a zarr.json object | ||
# (it is optional in the case of implicit groups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are implicit groups really a thing? I don't think they show up in the v3 spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do show up in the spec (in quite a few places actually)! https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#explicit-vs-implicit-groups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that!
if self.zarr_format == 3: | ||
zarr_json_bytes = await (store_path / ZARR_JSON).get_async() | ||
if zarr_json_bytes is None: | ||
# implicit group? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit dangerous because we have no good way of telling whether this path even exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and we should return to this. For now I'm going to put a log statement there and we can address this later.
store_path, zarr_json, runtime_configuration=self.runtime_configuration | ||
) | ||
elif self.zarr_format == 2: | ||
# Q: how do we like optimistically fetching .zgroup, .zarray, and .zattrs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overfetching is fine because it is happening in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this issue the other day. One possible behavior when you "open" anything in Zarr would be to call the store.tree(depth=1)
method, fetching all existing metadata docs at the root level and one level below.
- removed abcs for groups/arrays - improved return types in group.py - warn (temporarily) when an implicit group is found - add attributes.py with Attributes class
888551a
to
84b497f
Compare
elif self.zarr_format == 2: | ||
return { | ||
ZGROUP_JSON: self.zarr_format, | ||
ZATTRS_JSON: json.dumps(self.attributes).encode(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would custom JSON serialization of user attributes plug in here? e.g., i don't think numpy dtype objects serialize to JSON easily. Maybe to_bytes
can take a attrs_serializer
kwarg or something.
def create_array(self, name: str, **kwargs) -> Array: | ||
return Array(self._sync(self._async_group.create_array(name, **kwargs))) | ||
|
||
def empty(self, **kwargs) -> Array: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts on ultimately putting these routines in a dedicated namespace?
try: | ||
if self.store == other.store and self.path == other.path: | ||
return True | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably be a bit more specific about the exception type here
This begins the work of harmonizing the V2 and V3 group APIs from Zarrita with the Zarr-Python Group API.
I'll leave some comments inline.
TODO: