Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ jobs:

- name: Test
working-directory: ./build
run: python -m pytest
run: python -m pytest -ra

build_wheels:
name: Wheels on ${{ matrix.os }}
Expand Down
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,7 @@ hist.fill(
)

# Numpy array view into histogram counts, no overflow bins
counts = hist.view()

values = hist.values()
```

## Features
Expand All @@ -75,7 +74,7 @@ counts = hist.view()
* `bh.axis.Integer(start, stop, underflow=True, overflow=True, growth=False)`: Special high-speed version of `regular` for evenly spaced bins of width 1
* `bh.axis.Variable([start, edge1, edge2, ..., stop], underflow=True, overflow=True)`: Uneven bin spacing
* `bh.axis.Category([...], growth=False)`: Integer or string categories
* `bh.axis.Boolean()`: A True/False axis [(known issue with slicing/selection in 0.8.0)]()
* `bh.axis.Boolean()`: A True/False axis
* Axis features:
* `.index(value)`: The index at a point (or points) on the axis
* `.value(index)`: The value for a fractional bin (or bins) in the axis
Expand All @@ -84,7 +83,7 @@ counts = hist.view()
* `.edges`: The N+1 bin edges (if continuous)
* `.extent`: The number of bins (including under/overflow)
* `.metadata`: Anything a user wants to store
* `.options`: The options set on the axis (`bh.axis.options`)
* `.traits`: The options set on the axis (`bh.axis.options`)
* `.size`: The number of bins (not including under/overflow)
* `.widths`: The N bin widths
* Many storage types
Expand All @@ -106,10 +105,14 @@ counts = hist.view()
* `+`: Add two histograms (storages must match types currently)
* `*=`: Multiply by a scaler (not all storages) (`hist * scalar` and `scalar * hist` supported too)
* `/=`: Divide by a scaler (not all storages) (`hist / scalar` supported too)
* `.kind`: Either `bh.Kind.COUNT` or `bh.Kind.MEAN`, depending on storage
* `.sum(flow=False)`: The total count of all bins
* `.project(ax1, ax2, ...)`: Project down to listed axis (numbers)
* `.to_numpy(flow=False)`: Convert to a NumPy style tuple (with or without under/overflow bins)
* `.view(flow=False)`: Get a view on the bin contents (with or without under/overflow bins)
* `.values(flow=False)`: Get a view on the values (counts or means, depending on storage)
* `.variances(flow=False)`: Get the variances if available
* `.counts(flow=False)`: Get the effective counts for all storage types
* `.reset()`: Set counters to 0
* `.empty(flow=False)`: Check to see if the histogram is empty (can check flow bins too if asked)
* `.copy(deep=False)`: Make a copy of a histogram
Expand Down
13 changes: 13 additions & 0 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ Pressing forward to 1.0.

#### User changes

* Support for PlottableProtocol. You can now access `.values()`, `.counts()`,
and `.variances()` on all storages; used by plotting libraries. `.kind` describes
the Kind of the histogram (`bh.Kind.COUNT` or `bh.Kind.MEAN`). `.options` has
been renamed to `.traits`, and a few more useful traits were added, like
`.discrete`. Most other portions of the Protocol were already present. [#476][]
* You can now set all complex storages, either on a Histogram or a View with an
(N+1)D array [#475][]
* Axes are now normal `__dict__` classes, you can manipulate the `__dict__` as
Expand All @@ -26,9 +31,17 @@ Pressing forward to 1.0.
* Bumped to pybind11 2.6.1 [#470][]
* Black formatting used in notebooks too [#470][]


#### Upgrade warning

If you are using `Axis.options`, please transition to `Axis.traits`. `traits`
includes all the old options, along with some new traits, and matches the
PlottableProtocol requirements.

[#470]: https://github.com/scikit-hep/boost-histogram/pull/470
[#472]: https://github.com/scikit-hep/boost-histogram/pull/472
[#475]: https://github.com/scikit-hep/boost-histogram/pull/475
[#476]: https://github.com/scikit-hep/boost-histogram/pull/476
[#477]: https://github.com/scikit-hep/boost-histogram/pull/477


Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ See :ref:`usage-quickstart` for more.
usage/accumulators
usage/transforms
usage/indexing
usage/plotting
usage/analyses
usage/numpy
usage/comparison
Expand Down
10 changes: 8 additions & 2 deletions docs/usage/histogram.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,15 @@ All storages support a ``weight=`` parameter, and some storages support a ``samp

The summing accumulators (not ``Mean()`` and ``WeightedMean())``) support threaded filling. Pass ``threads=N`` to the fill parameter to fill with ``N`` threads (and using 0 will select the number of virtual cores on your system). This is helpful only if you have a large number of entries compared to your number of bins, as all non-atomic storages will make copies for each thread, and then will recombine after the fill is complete.

Data
^^^^

The primary value from a histogram is always available as ``.value()``. The variance is available as ``.variances()``, unless you fill an unweighed histogram with weights, which will cause this to be return None, since the variance is no longer computable (use a weighted storage instead if you need the variances). The counts are available as ``.counts()``. If the histogram is weighted, .counts() returns the effective counts; see TODO for details.

Views
^^^^^

While Histograms do conform to the Python buffer protocol, the best way to get access to the contents of a histogram as a Numpy array is with ``.view()``. This way you can optionally pass ``flow=True`` to get the flow bins, and if you have an accumulator storage, you will get a View, which is a slightly augmented ndarrray subclass (see :ref:`usage-accumulators`).
While Histograms do conform to the Python buffer protocol, the best way to get access to the raw contents of a histogram as a NumPy array is with ``.view()``. This way you can optionally pass ``flow=True`` to get the flow bins, and if you have an accumulator storage, you will get a View, which is a slightly augmented ndarrray subclass (see :ref:`usage-accumulators`).


Operations
Expand All @@ -35,7 +40,7 @@ Operations

* ``.sum(flow=False)``: The total count of all bins
* ``.project(ax1, ax2, ...)``: Project down to listed axis (numbers)
* ``.to_numpy(flow=False)``: Convert to a Numpy style tuple (with or without under/overflow bins)
* ``.to_numpy(flow=False)``: Convert to a NumPy style tuple (with or without under/overflow bins)
* ``.view(flow=False)``: Get a view on the bin contents (with or without under/overflow bins)
* ``.reset()``: Set counters to 0
* ``.empty(flow=False)``: Check to see if the histogram is empty (can check flow bins too if asked)
Expand All @@ -49,6 +54,7 @@ Operations
* ``.axes.centers``: The centers of the bins broadcasting-ready array
* ``.axes.widths``: The bin widths as a broadcasting-ready array
* ``.axes.metadata``: A tuple of the axes metadata
* ``.axes.traits``: A tuple of the axes traits

* ``.axes.size``: A tuple of the axes sizes (size without flow)
* ``.axes.extent``: A tuple of the axes extents (size with flow)
Expand Down
24 changes: 12 additions & 12 deletions docs/usage/numpy.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _usage-numpy:

Numpy compatibility
NumPy compatibility
===================

Histogram conversion
Expand All @@ -12,19 +12,19 @@ Accessing the storage array
You can access the storage of any Histogram using ``.view()``, see
:ref:`usage-histogram`.

Numpy tuple output
NumPy tuple output
^^^^^^^^^^^^^^^^^^

You can directly convert a histogram into the tuple of outputs that
``np.histogram*`` would give you using ``.to_numpy()`` or
``.to_numpy(flow=True)`` on any histogram. This returns
``edges[0], edges[1], ..., view``, and the edges are Numpy-style (upper edge
``edges[0], edges[1], ..., view``, and the edges are NumPy-style (upper edge
inclusive).

Numpy adaptors
NumPy adaptors
--------------

You can use boost-histogram as a drop in replacement for Numpy histograms. All
You can use boost-histogram as a drop in replacement for NumPy histograms. All
three histogram functions (``bh.numpy.histogram``, ``bh.numpy.histgram2d``, and
``bh.histogram.histogramdd``) are provided. The syntax is identical, though
boost-histogram adds three new keyword-only arguments; ``storage=`` to select the
Expand Down Expand Up @@ -58,9 +58,9 @@ Of course, you then are either left on your own to compute centers,
density, widths, and more, or in some cases you can change the
computation call itself to add ``density=``, or use the matching
function inside Matplotlib, and the API is different if you want 2D or
ND histograms. But if you already use Numpy histograms and you really
ND histograms. But if you already use NumPy histograms and you really
don’t want to rewrite your code, boost-histogram has adaptors for the
three histogram functions in Numpy:
three histogram functions in NumPy:

.. code:: python3

Expand All @@ -72,14 +72,14 @@ three histogram functions in Numpy:
7.3 ms ± 55.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This is only a hair slower than using the raw boost-histogram API,
and is still a nice performance boost over Numpy. You can even use the
Numpy syntax if you want a boost-histogram object later:
and is still a nice performance boost over NumPy. You can even use the
NumPy syntax if you want a boost-histogram object later:

.. code:: python3

hist = bh.numpy.histogram(norm_vals, bins=100, range=(0, 10), histogram=bh.Histogram)

You can later get a Numpy style output tuple from a histogram object:
You can later get a NumPy style output tuple from a histogram object:

.. code:: python3

Expand All @@ -98,7 +98,7 @@ So you can transition your code slowly to boost-histogram.
((1, 0),(0, .5)),
10_000_000).T.copy()

We can check the performance against Numpy again; Numpy does not do well
We can check the performance against NumPy again; NumPy does not do well
with regular spaced bins in more than 1D:

.. code:: python3
Expand All @@ -120,7 +120,7 @@ with regular spaced bins in more than 1D:
101 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

For more than one dimension, boost-histogram is more than an order of
magnitude faster than Numpy for regular spaced binning. Although
magnitude faster than NumPy for regular spaced binning. Although
optimizations may be added to boost-histogram for common axes
combinations later, in 0.6.1, all axes combinations share a common code
base, so you can expect *at least* this level of performance regardless
Expand Down
40 changes: 40 additions & 0 deletions docs/usage/plotting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. _usage-plotting:

Plotting
========

Boost-histogram does not contain plotting functions - this is outside of the
scope of the project, which is histogram filling and manipulation. However, it
does follow ``PlottableProtocol``, as listed below. Any plotting library that
accepts an object that follows the ``PlottableProtocol`` can plot boost-histogram
objects.

Using the protocol:

Plotters should only depend on the methods and attributes listed below. In short, they are:

* ``h.kind``: The ``bh.Kind`` of the histogram (COUNT or MEAN)
* ``h.values()``: The value (as given by the kind)
* ``h.variances()``: The variance in the value (None if an unweighed histogram was filled with weights)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is not correct, because .variances() does not give you the variance of the mean if kind == MEAN it gives the variances of the samples. You need to divide that by .counts() to get the variance of the mean.

Copy link
Member Author

@henryiii henryiii Dec 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided in the PlottableProtocol that .variances() is always the variance of .value(), so that a "dumb" plotter could just plot values and variances and ignore the .kind and get something reasonable. It is also simpler if ".values" is the value of the Kind, and the ".variances" is the variances of the Kind. So in this case, the wording is correct, but the value is incorrect, it should be calculated as .view()["variance"] / .view()["count"]? And a user wishing to get the sample variance would multiply by the .counts()?

I could easily be wrong here, though, either/both about what we decided and what is best. @jpivarski what does Uproot return for .variances() here for a TProfile?

* ``h.counts()``: How many fills the bin received or the effective number of fills if the histogram is weighted
* ``h.axes``: A Sequence of axes

Axes have:

* ``ax[i]``: A sequence of lower, upper bin, or the discrete bin value (integer or sting)
* ``len(ax)``: The number of bins
* ``ax.traits.circular``: True if circular
* ``ax.traits.discrete``: True if the bin represents a single value (e.g. Integer or Category axes) instead of an interval (e.g. Regular or Variable axes)

Plotters should see if ``.counts()`` is None; no boost-histogram objects currently
return None, but a future storage or different library could.

Also check ``.variances``; if not None, this storage holds variance information and
error bars should be included. Boost-histogram histograms will return something
unless they know that this is an invalid assumption (a weighted fill was made
on an unweighted histogram).

The full protocol version 1 follows:

.. literalinclude:: ../../tests/plottable.py
:language: python
27 changes: 20 additions & 7 deletions docs/usage/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,17 @@ See :ref:`usage-indexing`.
Accessing the contents
----------------------

You can use ``hist.view()`` to get
a Numpy array (or a RecArray-like wrapper for non-simple storages).
Most methods like ``.view()`` offer an optional keyword
argument that you can pass, ``flow=True``, to enable the under and
overflow bins (disabled by default).
You can use ``hist.values()`` to get a Numpy array from any histogram. You can
get the variances with ``hist.variances()``, though if you fill an unweighted
storage with weights, this will return None, as you no longer can compute the
variances correctly (please use a weighted storage if you need to). You can
also get the number of entries in a bin with ``.counts()``; this will return
counts even if your storage is a mean storage. See :ref:`_usage-plotting`.

If you want access to the full underlying storage, ``.view()`` will return a
NumPy array for simple storages or a RecArray-like wrapper for non-simple
storages. Most methods offer an optional keyword argument that you can pass,
``flow=True``, to enable the under and overflow bins (disabled by default).

.. code:: python3

Expand All @@ -98,7 +104,7 @@ overflow bins (disabled by default).
Setting the contents
--------------------

You can set the contents directly as you would a Numpy array;
You can set the contents directly as you would a NumPy array;
you can set either values or arrays at a time:

.. code:: python3
Expand All @@ -107,8 +113,15 @@ you can set either values or arrays at a time:
hist[bh.underflow] = 0 # set the underflow bin
hist2d[3:5, 2:4] = np.eye(2) # set with array

See :ref:`usage-indexing`.
For non-simple storages, you can add an extra dimension that matches the
constructor arguments of that accumulator. For example, if you want to fill
a Weight histogram with three values, you can dimension:

.. code:: python3

hist[0:3] = [[1,.1], [2, .2], [3, .3]]

See :ref:`usage-indexing`.

Accessing Axes
--------------
Expand Down
28 changes: 28 additions & 0 deletions include/bh_python/register_axis.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,34 @@ py::class_<A> register_axis(py::module& m, Args&&... args) {
}
})

.def_property_readonly("traits_underflow",
[](const A& self) {
return static_cast<bool>(
self.options() & bh::axis::option::underflow);
})
.def_property_readonly("traits_overflow",
[](const A& self) {
return static_cast<bool>(
self.options() & bh::axis::option::overflow);
})
.def_property_readonly("traits_circular",
[](const A& self) {
return static_cast<bool>(
self.options() & bh::axis::option::circular);
})
.def_property_readonly("traits_growth",
[](const A& self) {
return static_cast<bool>(self.options()
& bh::axis::option::growth);
})
.def_property_readonly(
"traits_continuous",
[](const A& self) { return bh::axis::traits::continuous(self); })
.def_property_readonly(
"traits_ordered",
[](const A& self) { return bh::axis::traits::ordered(self); })

// Deprecated - use the .traits property filled by the values above instead.
.def_property_readonly(
"options",
[](const A&) { return options{bh::axis::traits::get_options<A>::value}; },
Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ package_dir =
install_requires =
numpy >=1.13.3
typing >= 3.5; python_version < '3.5'
enum34 >= 1.1; python_version < '3.4'

[options.packages.find]
where = src
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@


extras = {
"test": ["pytest", "pytest-benchmark"],
"test": ["pytest", "pytest-benchmark", "typing_extensions"],
"docs": [
"Sphinx~=3.0",
"recommonmark>=0.5.0",
Expand Down
2 changes: 2 additions & 0 deletions src/boost_histogram/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from __future__ import absolute_import, division, print_function

from ._internal.hist import Histogram
from ._internal.enum import Kind
from . import axis, storage, accumulators, utils, numpy
from .tag import loc, rebin, sum, underflow, overflow

Expand Down Expand Up @@ -37,6 +38,7 @@

__all__ = (
"Histogram",
"Kind",
"axis",
"storage",
"accumulators",
Expand Down
Loading