-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
This is a pretty minor edge case, but the blog post on retiring the old Java API asks for feedback and this is something we ran into recently. (In 2.3.2, though I don't think anything's changed in later versions.)
In the HTTP API, when you request e.g. [25, 50, 75] percentiles, all you get back is exactly that. In the Java API, on the other hand, you get back a serialized InternalTDigestPercentiles which can be queried for the specific percentiles you requested, but also for any other percentile you might be interested in.
We use Percentiles in a calibration request to find bucket boundaries for a subsequent Range aggregation, and this becomes relevant when you have a very skewed distribution swamped by a single value, so that the [25, 50, 75] percentiles might all be the same value. Since the Range agg is driving faceted drilldown UI and a single-bucket navigator is pretty pointless, in this case we use the InternalTDigestPercentiles to binary-chop around until we find some useful bucket boundaries. AIUI this would no longer be possible with the HTTP API.
It's not the end of the world; one alternative considered was requesting a lot more percentiles up front to improve our chances of getting 3 different ones, and that wouldn't depend on the API. In a perfect world there'd be a Range variant for "N optimally-balanced buckets" instead of for explicit bucket boundaries, which would do away with the need for the extra calibration round-trip, but that's probably wishful thinking.