Skip to content

Feature gap between Java and HTTP APIs for Percentiles aggregation #23610

@mrec

Description

@mrec

This is a pretty minor edge case, but the blog post on retiring the old Java API asks for feedback and this is something we ran into recently. (In 2.3.2, though I don't think anything's changed in later versions.)

In the HTTP API, when you request e.g. [25, 50, 75] percentiles, all you get back is exactly that. In the Java API, on the other hand, you get back a serialized InternalTDigestPercentiles which can be queried for the specific percentiles you requested, but also for any other percentile you might be interested in.

We use Percentiles in a calibration request to find bucket boundaries for a subsequent Range aggregation, and this becomes relevant when you have a very skewed distribution swamped by a single value, so that the [25, 50, 75] percentiles might all be the same value. Since the Range agg is driving faceted drilldown UI and a single-bucket navigator is pretty pointless, in this case we use the InternalTDigestPercentiles to binary-chop around until we find some useful bucket boundaries. AIUI this would no longer be possible with the HTTP API.

It's not the end of the world; one alternative considered was requesting a lot more percentiles up front to improve our chances of getting 3 different ones, and that wouldn't depend on the API. In a perfect world there'd be a Range variant for "N optimally-balanced buckets" instead of for explicit bucket boundaries, which would do away with the need for the extra calibration round-trip, but that's probably wishful thinking.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions