Skip to content

Pandas (and other array package) improved compatibility #380

@henryiii

Description

@henryiii

boost-histogram currently doesn't work very well with Pandas DataFrames, requiring a np.asarary to work (and sometimes an explicit cast to a NumPy string datatype).

  • My suggestion would be to do the np.asarray wrapping inside the Python fill wrapper, so that these can be simplified when using non-NumPy based arrays.

  • You also can't setup categories from all iterables, like sets, but only from true lists, which is restrictive.

Actual usage:

hist = bh.Histogram(
    bh.axis.StrCategory(list(set(skhep.file_project))),
    bh.axis.Integer(2018, 2021, underflow=False, overflow=False),
    bh.axis.Integer(2, 4, underflow=False, overflow=False),
    storage=bh.storage.Int64()
)

hist.fill(np.asarray(skhep.file_project, dtype=str),
          np.asarray(skhep.timestamp.dt.year),
          np.asarray(skhep.details_python.str[0].astype(int))
)

Ideal usage:

hist = bh.Histogram(
    bh.axis.StrCategory(set(skhep.file_project)),
    bh.axis.Integer(2018, 2021, underflow=False, overflow=False),
    bh.axis.Integer(2, 4, underflow=False, overflow=False),
    storage=bh.storage.Int64()
)

hist.fill(skhep.file_project,
          skhep.timestamp.dt.year,
          skhep.details_python.str[0].astype(int)
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions