-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Even the best laid plans won't survive first contact with the user. Let me know if you want these broken into separate issues, but most are small. p.s. this is with v0.5.2
I installed the package and played around with it, first reaction: docstrings are quite sparse, probably due to the large amount of pybind11 shenanigans.
It would be nice if the various axis type inheritances are visible from python, or at least something that tells me isinstance(x, bh.axis.axis) or something to that effect.
1 (see #216)
First dumb move:
ax = bh.axis.regular(20, 0, 1)
ax.value()
raises a cryptic TypeError, but I just didn't know the function required arguments.
2 (see #215)
For growable categories, I hit several obstacles:
ax = bh.axis.category([], growth=True)
raises ValueError saying there must be at least one bin.
Is it not possible to create growable axis with empty starting categories?
3
Answered well by #214 (comment)
Playing with indexing,
ax = bh.axis.category(['old'], growth=True)
ax.index('new')
# returns 1
ax.extent
# returns 1, why?
so I assume we should only treat axis objects as immutable with no actual state w.r.t. filling?
Of course, if I make a histogram using this axis, everything works as expected:
h = bh.histogram(bh.axis.category(['old'], growth=True))
h.fill(['hi', 'there'])
h.axis(0).extent
# returns 3
4 (see #184)
The options for how to provide indices to fill are a bit weird:
h = bh.histogram(bh.axis.category(['old'], growth=True))
h.fill('hi')
raises ValueError about casting strings to char const&, so I cannot use plain strings?
4.a (#230)
I see the choice to interpret python strings as iterables causes some headache:
h = bh.histogram(bh.axis.category([''], growth=True), bh.axis.regular(20, 0, 1))
h.fill('hi', np.arange(4))
raises ValueError: spans must have compatible lengths
(although even when they are compatible, the string to char array issue comes up again)
5 #233
Ok, so sticking to arrays,
h = bh.histogram(bh.axis.category([''], growth=True), bh.axis.regular(20, 0, 1))
h.fill(['hi'], np.arange(4))
Segmentation fault: 11
uh oh!
but this works ok:
h = bh.histogram(bh.axis.category([''], growth=True), bh.axis.regular(20, 0, 1))
h.fill(['hi'], np.arange(1))
5.a #230 (not supported)
How about numpy scalars (dimension-0 arrays)?
h = bh.histogram(bh.axis.category([''], growth=True), bh.axis.regular(20, 0, 1))
h.fill(np.array('hi'), np.arange(3))
raises a cryptic ValueError: allocator<T>::allocate(size_t n) 'n' exceeds maximum supported size
But indeed numpy 1D arrays work:
h = bh.histogram(bh.axis.category([''], growth=True), bh.axis.regular(20, 0, 1))
h.fill(np.repeat('hi', 3), np.arange(3))
h.fill(np.array(['hi']), np.arange(3))
so broadcasting is supported, nice!
6
When adding histograms, the growable categories are not as flexible as I'd like:
h = bh.histogram(bh.axis.category([''], growth=True), bh.axis.regular(20, 0, 1))
h2 = bh.histogram(bh.axis.category([''], growth=True), bh.axis.regular(20, 0, 1))
h.fill(np.array(['hi']), np.array([.1]))
h2.fill(np.array(['hi again']), np.array([.2]))
h + h2
raises ValueError: axes of histograms differ, when clearly one could be grown to accept the other
I tried a workaround:
h.fill(np.array(['hi', 'hi again']), np.array([0, 0]), weight=np.zeros(2))
h2.fill(np.array(['hi', 'hi again']), np.array([0, 0]), weight=np.zeros(2))
h + h2
which also fails since the categories are introduced in a different order in the respective histograms.
7
About categorical axes, it looks like the storage contains the outer product of growable categories:
h = bh.histogram(
bh.axis.category([''], growth=True),
bh.axis.category([''], growth=True),
bh.axis.category([''], growth=True),
bh.axis.regular(60, 60, 120),
)
h.fill(
['dataset1', 'dataset2', 'dataset2'],
['region1', 'region1', 'region2'],
['', '', 'JESup'],
[90., 86., 92.],
)
import pickle
len(pickle.dumps(h))
returns 9290 (787 empty), while for comparison,
import coffea.hist as hist
h = hist.Hist('events',
hist.Cat('dataset', ''),
hist.Cat('region', ''),
hist.Cat('systematic', ''),
hist.Bin('mass', '', 60, 60, 120),
)
h.fill(dataset='dataset1', region='region1', systematic='', mass=90.)
h.fill(dataset='dataset2', region='region1', systematic='', mass=86.)
h.fill(dataset='dataset2', region='region2', systematic='JESup', mass=92.)
len(pickle.dumps(h))
returns 4898 (2880 empty). Coffea histograms have a fair bit of pickling overhead compared to boost, but the sparseness catches up.
Is there a way to request sparse bin storage here?