Skip to content

Conversation

TomNicholas
Copy link
Member

This seems to fix the examples in #9196, but causes other tests to fail, which I think just reveals that we have existing tests that assume this in-place modification is desired. I will change those tests to match the new intended behaviour.

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Jul 31, 2024
@TomNicholas TomNicholas marked this pull request as ready for review July 31, 2024 06:07
@TomNicholas TomNicholas marked this pull request as draft July 31, 2024 06:08
self.children = children

# shallow copy to avoid modifying arguments in-place (see GH issue #9196)
self.parent = parent.copy() if parent is not None else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove parent from the constructor instead?

It's a little weird to only be able to set the direct parent (and corresponding sibling), but not more distant relatives.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was going to do that in a follow-up PR, but I can do it in this one instead.

@TomNicholas
Copy link
Member Author

TomNicholas commented Aug 2, 2024

The main delay here is just how some of the tests are written - a lot of them rely upon in-place modification to work. e.g.

def test_path_property(self):

This change makes the constructor a somewhat unhelpful way to create trees from scratch. I could rewrite these tests either

  1. The ugly way I have started in 5db25bd, which doesn't refer to copied objects,

  2. Using .from_dict, which is much clearer (and the recommended user API), but relies on more complex code paths internally so doesn't isolate the behaviour of the constructor / relationship properties so much.

@flamingbear
Copy link
Member

  1. The ugly way I have started in 5db25bd, which doesn't refer to copied objects,

When you say it's ugly, it's what I was thinking. It seems closer to what you're testing.
This looks like a not super useful test.

john = DataTree.from_dict(
    {
        "/": DataTree(name="John"),
        "/Mary": DataTree(name="Mary"),
        "/Mary/Sue": DataTree(name="Sue"),
    }
)
assert john["/Mary/Sue"].path == "/Mary/Sue"
 

@shoyer
Copy link
Member

shoyer commented Aug 2, 2024

I think using from_dict for tests is fine, as long as you thoroughly test the constructor as well.

You could also avoid all the intermediate variables by writing something like:

john = DataTree(children={"Mary": DataTree(children={"Sue": DataTree()})})

Incidentally, it would sure be nice if we could eventually support automatic unpacking of nested dictionaries in the DataTree constructor, e.g., something like:

john = DataTree({"Mary": {"Sue": {}}})

Here the nature of the node (child DataTree vs DataArray) could be inferred from the argument type.

@shoyer
Copy link
Member

shoyer commented Aug 2, 2024

john = DataTree.from_dict(
    {
        "/": DataTree(name="John"),
        "/Mary": DataTree(name="Mary"),
        "/Mary/Sue": DataTree(name="Sue"),
    }
)
assert john["/Mary/Sue"].path == "/Mary/Sue"
 ``

Incidentally, I would try to avoid redundant name arguments, given how name is (or should be) overriden by the assigned key in the parent object. So I would wrtie this as:

john = DataTree.from_dict(
    {
        "/": DataTree(),
        "/Mary": DataTree(),
        "/Mary/Sue": DataTree(),
    }
)

or even just

john = DataTree.from_dict(
    {
        "/Mary/Sue": DataTree(),
    }
)

@TomNicholas
Copy link
Member Author

I've now re-written the tests to use approach (2), i.e. to just use DataTree.from_dict everywhere that a nested tree is required.

@TomNicholas TomNicholas marked this pull request as ready for review August 27, 2024 15:34
flamingbear and others added 7 commits September 2, 2024 14:38
I will happily take something better.

But this was the error I was getting

xarray/tests/test_datatree.py:127: error: Argument 1 to "relative_to" of
"NamedNode" has incompatible type "DataTree[Any] | DataArray"; expected
"NamedNode[Any]"  [arg-type]
But it doesn't fix all the tests There's three tests that I don't fully know
what should be tested or if they still make sense.
@flamingbear
Copy link
Member

flamingbear commented Sep 3, 2024

@TomNicholas I didn't want to push up unfinished work to this PR again. But 7c530f5?diff=split&w=0
is my stab at removing the parent parameter from the DataTree Constructor. I got all but three tests to pass but it seems like the three might not make sense anymore (or I just don't fully understand them)

FAILED xarray/tests/test_datatree.py::TestFamilyTree::test_create_two_children - TypeError: DataTree.__init__() got an unexpected keyword argument 'parent'
FAILED xarray/tests/test_datatree.py::TestFamilyTree::test_dont_modify_parent_inplace - TypeError: DataTree.__init__() got an unexpected keyword argument 'parent'
FAILED xarray/tests/test_datatree.py::TestFamilyTree::test_setparent_unnamed_child_node_fails - TypeError: DataTree.__init__() got an unexpected keyword argument 'parent'

Update edited:
test_dont_modify_parent_inplace looks like might have a bug. And probably something I didn't fix when I removed the parent keyword from the constructor?
fe5ae9c

I think this is a test for test_setparent_unnamed_child_node_fails
9a004d4

still have no idea what test_create_two_children is doing

these are just 3 commits on https://github.com/flamingbear/xarray/tree/datatree_init_dont_modify_inplace

@TomNicholas
Copy link
Member Author

I think these are all resolved by either:

  1. Changing outdated tests
  2. Also making .parent read-only.

@TomNicholas
Copy link
Member Author

This PR now does a few related things:

  1. Shallow copy children in the DataTree constructor (6c56b12),
  2. Removes the parent argument from the DataTree constructor (7c530f5)
  3. Makes the .parent attribute of DataTree read-only (897b589)
  4. Changes any tests that these effect by making them pass or deleting them.

@TomNicholas
Copy link
Member Author

I believe this is ready to merge. It might imply more changes to @flamingbear 's docs PR though.

@shoyer
Copy link
Member

shoyer commented Sep 7, 2024

Looks great, thanks Tom!

@TomNicholas TomNicholas merged commit 68b040a into pydata:main Sep 7, 2024
28 checks passed
@TomNicholas TomNicholas deleted the datatree_init_dont_modify_inplace branch September 7, 2024 18:52
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main: (29 commits)
  Release notes for v2024.09.0 (pydata#9480)
  Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451)
  Rename DataTree's "ds" and "data" to "dataset" (pydata#9476)
  Update DataTree repr to indicate inheritance (pydata#9470)
  Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460)
  Repo checker (pydata#9450)
  Add days_in_year and decimal_year to dt accessor (pydata#9105)
  remove parent argument from DataTree.__init__ (pydata#9465)
  Fix inheritance in DataTree.copy() (pydata#9457)
  Implement `DataTree.__delitem__` (pydata#9453)
  Add ASV for datatree.from_dict (pydata#9459)
  Make the first argument in DataTree.from_dict positional only (pydata#9446)
  Fix typos across the code, doc and comments (pydata#9443)
  DataTree should not be "Generic" (pydata#9445)
  Disallow passing a DataArray as data into the DataTree constructor (pydata#9444)
  Support additional dtypes in `resample` (pydata#9413)
  Shallow copy parent and children in DataTree constructor (pydata#9297)
  Bump minimum versions for dependencies (pydata#9434)
  Always include at least one category in random test data (pydata#9436)
  Avoid deep-copy when constructing groupby codes (pydata#9429)
  ...
hollymandel pushed a commit to hollymandel/xarray that referenced this pull request Sep 23, 2024
* add tests

* fix by shallow copying

* correct first few tests

* replace constructors in tests with DataTree.from_dict

* rewrite simple_datatree fixture to use DataTree.from_dict

* fix incorrect creation of nested tree in formatting test

* Update doctests for from_dict constructor

* swap i and h in doctest example for clarity.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix a few mypy errors.

* Bonkers way to set type checking

I will happily take something better.

But this was the error I was getting

xarray/tests/test_datatree.py:127: error: Argument 1 to "relative_to" of
"NamedNode" has incompatible type "DataTree[Any] | DataArray"; expected
"NamedNode[Any]"  [arg-type]

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removes parent keyword from DataTree constructor

But it doesn't fix all the tests There's three tests that I don't fully know
what should be tested or if they still make sense.

* fix test_setparent_unnamed_child_node_fails

* fix test_dont_modify_parent_inplace -> bug?

* fix test_create_two_children

* make .parent read-only, and remove tests which test the parent setter

* update error message to reflect fact that .children is Frozen

* fix another test

* add test that parent setter tells you to set children instead

* fix mypy error due to overriding settable property with read-only property

* fix test by not trying to set parent via kwarg

---------

Co-authored-by: Matt Savoie <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
Development

Successfully merging this pull request may close these issues.

3 participants