Skip to content

Conversation

@henryiii
Copy link
Member

@henryiii henryiii commented Jul 10, 2020

This supports .metadata being propagated through the histogram, and adds __slots__.

@henryiii
Copy link
Member Author

henryiii commented Jul 10, 2020

@HDembinski, I still need to add tests and make a few changes, but see what you think of this. It's roughly based on your idea in #400.

Axes simulate __dict__ access, while technically storing the metadata as before in the C++ axes. You can access the metadata directly with .metadata, or as normal attributes.

ax = Boolean(metadata={"title": "Hello", "name": "hello"})
assert ax.title == "Hello"
assert ax.name == "hello"
ax.label = "world"
assert ax.metadata == {"title: "Hello", "name": "hello", "label": "world"}

For backward compatibility, we still support non-dict metadata - it just disables the set/get via attributes. If the metadata is None, it will become a dict when you try to set it via attributes. Tab completion in IPython, etc. works. Since this is still in the C++ axes, it caries through slicing or any other axes modifying operation. If the metadata is not "nice", that is, not a string or the same name as a property, you need to use the metadata syntax to access it.

Histograms actually uses __dict__ directly, but still provides a .metadata to get a similar view for consistency. This causes a minor difference: only dict metadata's are supported. We could exactly mimic the axis system instead by storing in _metadata instead of __dict__, might be more consistent and provide the ability to put arbitrary metadata in, just like for axes.

What do you think?

@henryiii
Copy link
Member Author

henryiii commented Jul 10, 2020

Note that with this, we can provide the same API into Hist as boost-histogram. Hist will simply provide extra meaning/checks on some metadata items like name, but they can be present in boost-histogram as well.

The current metadata keyword is just a way for uproot, etc. to prepopulate the metadata when it makes the histogram/axes.

We could also populate metadata from **kwargs - possibly even leaving the actual .metadata to be just as arbitrary as any other item you add. This is a bit more restrictive to adding future keyword arguments, though, so I didn't add it. For example, Regular(name="x") would then be valid in both boost-histogram and Hist. But then it's more restrictive to future changes and additions; for example, if Hist adds Regular(flow=False), that would have a different meaning than in boost-histogram, where it would set the metadata flow item instead. I'm mildly against this; if metadata={"name":...} works in both, that's good enough for APIs.

@henryiii
Copy link
Member Author

We could go the other way too, I think - use the axes __dict__, and then copy the dict into the C++ axes before any operation that might change the axes, and then copy it back out into __dict__ when done. (In both cases, filtering out the internal "_ax" item that holds the C++ object, and they are references, so not too expensive to copy around). Then metadata= could simply set ".metadata" to whatever it contains. We could deprecate the metadata= keyword, even, since it no longer would be special.

@henryiii henryiii force-pushed the feat/metadata branch 4 times, most recently from aa1dfa6 to 390e057 Compare July 11, 2020 00:01
@henryiii
Copy link
Member Author

henryiii commented Jul 11, 2020

Lesson: always remember to rebase...

So, here are the options as I see them:

  1. Add .metadata to Histogram, and make it behave like the current Axis metadata. Anything else in __dict__ Does not propagate, or we turn it off.
  2. Use __getattr__/__setattr__ on metadata to make it behave a lot like a class with __dict__ if metadata is a dict. (Currently implemented on Axis in this PR).
    a. Option: Allow **kwargs to set arbitrary properties instead of using metadata= -> title="x" would set ax.metadata["title"] = "x".
  3. Use __dict__, then for Axis, use the workaround that (most of) this dict is set as metadata when performing operation that change the axes, like slicing and projection. (Currently implemented on Histogram in this PR)
    a. Option: .metadata is no longer special.
    b. Option: Drop metadata= keyword in Axis constructors

Personally, I think I like either option 1 (simple, forward compatible) or option 2 without option 2.a best. I think option 2 is most natural for tools (Uproot, hist, etc) to attach arbitrary metadata, but still provides reasonable access to it and behave like a normal __dict__ class. (PS: I generally don't like __dict__ classes, neither does MyPy - the general rule is new members should not be added after __init__). This is conceptually similar to the way Pandas handles columns and uproot handles data structures. The thing I like is that we can set a Protocol for plotting, and the Protocol can look for axes[i].title when plotting, and boost-histogram could supply it (manually), rather than only working on Hist histograms. But I'm mostly neutral, I'd be happy with any of them or a modification.

Whatever we do, we should do something externally similar for both Histogram / Axis.

@HDembinski, your call.

@HDembinski
Copy link
Member

Thank you for listing all these options. I see you spend a good amount of research on this.

It is not an easy choice, but after reading all that I am swayed to option 1. Funneling everything through the ".metadata" attribute has a centain appeal and then we can keep __dict__ turned off.

@henryiii
Copy link
Member Author

Okay, that's also the simplest/cleanest option. I'll prepare it.

@henryiii henryiii changed the title WIP: metadata and __slots__ feat: metadata and __slots__ Jul 11, 2020
@henryiii henryiii marked this pull request as ready for review July 11, 2020 20:06
@henryiii henryiii merged commit 3771643 into develop Jul 11, 2020
@henryiii henryiii deleted the feat/metadata branch July 11, 2020 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants