Weighted choice algorithms

See also: https://github.com/dhardy/rand/issues/82, #518 

---

Fundamentally I see three types of weighted-choice algorithm:

1.  Calculate `weight_sum`, take `sample = rng.gen_range(0, weight_sum)`, iterate over elements until cumulative weight exceeds `sample` then take the previous item.
2.  Calculate a CDF of weights (probably just an array of cumulative weights), take `sample` as above, then find item by binary search; look up element from the index
3.  As follows:
    ```rust
    fn choose_weighted<R, F, I, X>(items: I, weight_fn: F, rng: &mut R) -> Option<T>
    where
        R: Rng + ?Sized,
        I: Iterator<T>,
        F: Fn(&T) -> W,
        X: SampleUniform +
            ::core::ops::AddAssign<X> +
            ::core::cmp::PartialOrd<X>
    {
        let mut result = if let Some(item) = items.next() {
            item
        } else {
            return None;
        };
        let mut sum = weight_fn(&result);
        
        while let Some(item) = items.next() {
            let weight = weight_fn(&item);
            sum += weight;
            if rng.gen_range(0, sum) < weight {
                result = item;
            }
        }
        Some(result)
    }
    ```

---

Where one wants to sample from the same set of weights multiple times, calculating a CDF is the obvious choice since the CDF should require *no more memory* than the original weights themselves.

Where one wants to sample a single time from a slice, one of the first two choices makes the most sense; since calculating the total weight requires all the work of calculating the CDF *except storing the results*, using the CDF may often be the best option but this isn't guaranteed.

Where one wants to sample a single time from an iterator, any of the above can be used, but the first two options require either cloning the iterator and iterating twice (not always possible and slightly expensive) or collecting all items into a temporary vector while calculating the sum/CDF, then selecting the required item. In this case the last option may be attractive, though of course sampling the RNG for every item has significant overhead (so probably is only useful for large elements or no allocator).

---

Which algorithm(s) should we include in Rand?

The method calculating the CDF will often be preferred, so should be included. Unfortunately it requires an allocator (excepting if weighs are provided via mutable reference to a slice), but we should probably not worry about this.

A convenience method to sample from weighted slices would presumably prefer to use the CDF method normally.

For a method to sample from weighted iterators it is less clear which implementation should be used. Although it will not perform well, the last algorithm (i.e. sample code above) may be a nice choice in that it does not require an allocator.

---

My conclusion: perhaps we should accept #518 in its *current* form (i.e. `WeightedIndex` distribution using CDF + binary search, and convenience wrappers for slices), plus consider adding the code here to sample from iterators.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Weighted choice algorithms #532

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Weighted choice algorithms #532

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions