-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
added to_dict function for xarray objects #917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
xarray/core/common.py
Outdated
d['coords'].update({k: {'data': self[k].values.tolist(), | ||
'dims': list(self[k].dims), | ||
'attrs': dict(self[k].attrs)}}) | ||
if hasattr(self, 'data_vars'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These sorts of checks are best avoided, if possible. I would simply write two implementations of to_dict
, one on Dataset
and one on DataArray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I changed it below.
This looks like a great start, but to be really useful it needs a couple other things:
Please ask if you need guidance on where to start for any of these! This will be a very welcome addition to xarray :). |
Ok, I wrote a
Regarding unit tests, I haven't ever written one before but I would be happy to try (I know they are an important part of good development). |
…ghlight roundtripping issue
Ok, I have committed a first pass at unit tests. I purposefully wrote a failing time test. |
I think the datetime issue is a numpy bug related to numpy/numpy#7619 We can work around this by casting to
The same issue holds for timedelta:
We can work around these by checking for timedelta64 or datetime64 dtypes (use |
xarray/test/test_dataset.py
Outdated
} | ||
}, | ||
'attrs': {}, | ||
'dims': ['t'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe save dims
on a dataset as a dict instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the value of that? For DataArray
it will have to be a list or an OrderedDict
anyways so that the shape of the data matches the shape of the dims.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true -- dims
is inconsistent between Dataset
and DataArray
. But short of switching dims
on DataArray
to an OrderedDict (maybe not a bad idea, but a separate discussion), I think the serialization format should be consistent with xarray's data model.
One big thing that |
xarray/core/dataset.py
Outdated
""" | ||
Convert a dictionary into an xarray.Dataset. | ||
""" | ||
obj = cls() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the way that Dataset works internally, it can be much slower (for a large number of variables) to add them incrementally to a Dataset rather than to call the Dataset constructor once. Also, the behavior could differ, because the variables are aligned incrementally rather than all at once.
So I would consider building up ordered dictionaries for data_vars
, coords
and attrs
and calling cls(data_vars, coords, attrs)
at the end.
It would also be good to test roundtripping arrays with some |
Ok, I made the changes that you suggested. I still need to work on the |
@shoyer I think I have done all the things that you mentioned and I added |
xarray/core/dataarray.py
Outdated
try: | ||
coords = OrderedDict([(var[0], (var[1]['dims'], | ||
var[1]['data'], | ||
var[1].get('attrs'))) for var in d['coords'].items()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple unpacking is more readable than using indexing: for k, v in d['coords'].items()
Ok, so I merged with the current master and added documentation and a what's-new note. I hope I did that right. @shoyer I am really new to contributing so thanks for all your help. Let me know if anything needs changing. |
remains unchanged. Because the internal design of xarray is still being | ||
refined, we make no guarantees (at this point) that objects pickled with | ||
this version of xarray will work in future versions. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a more specific target here, e.g.,
.. _dictionary io:
I have a few minor suggestions, but otherwise this looks very nice! To ensure that the documentation subpages are built for these methods (which makes the sphinx link work), you need to add them to If you haven't tested the docs locally with |
xarray/core/dataarray.py
Outdated
'attrs': {'title': 'air temperature'}, | ||
'dims': 't', | ||
'data': x, | ||
d = {'coords': {'t': {'dims': 't', 'data': t, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this bad form? It renders more cleanly in the html docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use ::
and indentation, which will format this as code in the HTML docs: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt#other-points-to-keep-in-mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The \n
and the like are a little bad in the docstring because they aren't valid Python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense
I made the docs locally and they look how I expect. |
Done! |
Thank you @jsignell -- really nice work here! |
@jsignell was the intention to have only python scalars/lists in the dictionary (no numpy generics or arrays)? Right now attribute values are not being converted and results in a mixed |
@kwilcox, I hadn't thought much about this. I guess the intention was to have only python scalars/lists, but I don't know if we want to get in the business of converting attribute values since there are so many options for what they could be. What do you think makes the most sense? |
Before I looked at the code I assumed it was going to convert... but that's just me! As it stands a custom encoder is needed to get a JSON dump from the output, which will be a fairly common use case for this function. See https://gist.github.com/kwilcox/c41834297b1a3b732cae3ee16621f6d0. In the least maybe a little note in the documentation on how to dump the |
I think we should try to convert numpy scalars/arrays, because otherwise the data won't convert directly to JSON. Possibly could take a duck typing approach of looking for |
We should recommend |
@shoyer is this something I should be working on? I am happy to, just don't know how this normally goes. |
Up to you, but yes it would be greatly appreciated if you could work on On Mon, Oct 17, 2016 at 12:15 PM, Julia Signell [email protected]
|
After the conversation #432