-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Relates to #42949
ValuesSourceConfig has a concept of "unmapped", which the user has specified a field name, but we are unable to resolve that to a mapped field (Note that this is different from the user not specifying a field at all, in which case a script is required or the query is invalid). This might happen, for example, if a user were running an aggregation over multiple indexes, and one didn't have the field. If the user supplied a missing value, VSConfig returns a ValuesSource which returns that missing value and the aggregation proceeds as normal. Otherwise, the aggregation is given an opportunity to optimize for the unmapped case via the createUnmapped method. See TermsAggregatorFactory for an example of where this is used.
There are a couple of problems to fix here. First and foremost, ValuesSourceConfig communicates the unmapped state back to the aggregation builder by returning a null values source. This ends up with a lot of null checking down stream, and the associated cognitive load and potential for missing a spot and triggering an NPE. Instead, VSConfig could expose an isUnmapped predicate, and always return a defined values source (We have static empty implementations that would be ideal here). The type for this values source would be determined by either the user's valueType hint or the default values source type, if the user didn't send a hint. This is the same way we decide the type for an unmapped field with a missing value now.
Additionally, aggregations that don't specialize for the unmapped case (e.g. Histogram) currently just hard code an aggregator to run. More logically, these should still respect the ValuesSourceType as resolved in ValuesSourceConfig. In many cases, createUnmapped could just call doCreateInternal with a ValuesSourceType and an empty (but not null) ValuesSource