Skip to content

Give _uid doc values #11887

@jpountz

Description

@jpountz

We already use fielddata on the _uid field today in order to implement random sorting. However, given that doc values are disabled on _uid, this will use an insane amount of memory in order to load information in memory given that this field only has unique values.

Having better fielddata for _uid would also be useful in order to have more consistent sort order when paginating or hitting different replicas: we could always add a tie-break on the value of the _uid field.

I think we have several options:

  • Option 1: Add SORTED doc values to _uid
  • Option 2: Add BINARY doc values to _uid
  • Option 3: Add SORTED doc values to _type and _id
  • Option 4: Add SORTED doc values to _type and BINARY to _id

Option 2 would probably be wasteful in terms of disk space given that we don't have good compression available for binary doc values (and it's hard to implement given that the values can store pretty much anything).

Options 3 and 4 have the benefit of not having to duplicate information if we also want to have doc values on _type and _id: we could even build a BINARY fielddata view for _uid.

Then the other question is whether we should rather use sorted or binary doc values, the former being better for sorting (useful for the consistent sorting use-case) and the latter being better for value lookups (useful for random sorting).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions