-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
We already use fielddata on the _uid field today in order to implement random sorting. However, given that doc values are disabled on _uid, this will use an insane amount of memory in order to load information in memory given that this field only has unique values.
Having better fielddata for _uid would also be useful in order to have more consistent sort order when paginating or hitting different replicas: we could always add a tie-break on the value of the _uid field.
I think we have several options:
- Option 1: Add SORTED doc values to
_uid - Option 2: Add BINARY doc values to
_uid - Option 3: Add SORTED doc values to
_typeand_id - Option 4: Add SORTED doc values to
_typeand BINARY to_id
Option 2 would probably be wasteful in terms of disk space given that we don't have good compression available for binary doc values (and it's hard to implement given that the values can store pretty much anything).
Options 3 and 4 have the benefit of not having to duplicate information if we also want to have doc values on _type and _id: we could even build a BINARY fielddata view for _uid.
Then the other question is whether we should rather use sorted or binary doc values, the former being better for sorting (useful for the consistent sorting use-case) and the latter being better for value lookups (useful for random sorting).