Skip to content

Clarify async search REST parameters #54069

@javanna

Description

@javanna

When playing with the new async search API I noticed a couple of inconsistencies and potential naming problems that I would like to discuss. Note that it's important to address these now as the API haven't been released yet and they are declared stable in our REST spec.

I asked @karmi for his input to validate my concerns and come up with some proposal. The following are the problems and the changes that we are proposing:

  • wait_for_completion: it indicates how long you are willing to block and wait for results when submitting an async search, effectively turning async search to sync.
    wait_for_completion is used in other API but with type boolean, while it is exposed as a number, effectively a timeout, in submit async search. This introduces inconsistency in our REST API, and it will cause issues for some of the language clients.
    Proposal: rename it to wait_for_results_timeout: this way we include the timeout terminology and we don't reuse the existing wait_for_completion. Also results better explains what it is that users are waiting for compared to completion.

  • keep_alive: it indicates how long the async search is available within the cluster. That means that when such timeout expires, the search will be stopped if still running or its results will be purged if it has already completed.
    The keep_alive naming comes from http terminology where it has to do with connections, while here the semantics is around how long state will be available/stored in the cluster, which could lead to misinterpreting what the parameter does.
    Proposal: rename it to keep_results_timeout: this way we move away from reusing http terminology, and we make it clear that it's also a timeout around how long results will be available. Maybe what is not super clear about this is that the counting starts when the async search is submitted, not when it is completed. Suggestions are welcome.

  • clean_on_completion: it indicates whether results should not be stored once they are returned within the above described timeout.
    There is some double negation in its description that makes it hard to understand it. Also, the notion of completion can be confusing as it's not about whether the search was completed but whether the results were returned within the provided (currently wait_for_completion) timeout. By default, results are not stored when they are returned directly by submit async search. Being it a boolean it may make users think that they can disable storing results at all times, but storing results can not be disabled, rightly so, when submit async search did not return them within the timeout.
    I considered removing this parameter, because when results have been returned, they could be stored externally. It turns out though that this parameter is useful to make testing deterministic and it makes sense to keep it.
    Proposal: rename it to keep_results and make it an enum rather than a boolean with two possible values: auto (the default behaviour: store results unless submit async search returned them within keep_results_timeout) and always (store results for later retrieval even if they have been returned by submit async search within the provided timeout). I find that this better reflects the behaviour of the API and aligns well with the above proposed rename of keep_alive to keep_results_timeout as they are somehow related. Note that always does not mean forever, the results will always be cleaned when their validity expires.

  • The rename of wait_for_completion to wait_for_results_timeout should also be applied to the get async search API, but maybe we should consider whether this parameter is useful when retrieving results? Users that are calling get async search are taking advantage of the async nature of async search, hence while I see why one would block and wait when submitting, I don't see why one would block and wait when retrieving results. Is avoiding an additional call when the search is almost complete a good enough reason to expose this parameter?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions