-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
When playing with the new async search API I noticed a couple of inconsistencies and potential naming problems that I would like to discuss. Note that it's important to address these now as the API haven't been released yet and they are declared stable in our REST spec.
I asked @karmi for his input to validate my concerns and come up with some proposal. The following are the problems and the changes that we are proposing:
-
wait_for_completion: it indicates how long you are willing to block and wait for results when submitting an async search, effectively turning async search to sync.
wait_for_completionis used in other API but with typeboolean, while it is exposed as anumber, effectively a timeout, in submit async search. This introduces inconsistency in our REST API, and it will cause issues for some of the language clients.
Proposal: rename it towait_for_results_timeout: this way we include the timeout terminology and we don't reuse the existingwait_for_completion. Alsoresultsbetter explains what it is that users are waiting for compared tocompletion. -
keep_alive: it indicates how long the async search is available within the cluster. That means that when such timeout expires, the search will be stopped if still running or its results will be purged if it has already completed.
Thekeep_alivenaming comes from http terminology where it has to do with connections, while here the semantics is around how long state will be available/stored in the cluster, which could lead to misinterpreting what the parameter does.
Proposal: rename it tokeep_results_timeout: this way we move away from reusing http terminology, and we make it clear that it's also a timeout around how long results will be available. Maybe what is not super clear about this is that the counting starts when the async search is submitted, not when it is completed. Suggestions are welcome. -
clean_on_completion: it indicates whether results should not be stored once they are returned within the above described timeout.
There is some double negation in its description that makes it hard to understand it. Also, the notion ofcompletioncan be confusing as it's not about whether the search was completed but whether the results were returned within the provided (currentlywait_for_completion) timeout. By default, results are not stored when they are returned directly by submit async search. Being it abooleanit may make users think that they can disable storing results at all times, but storing results can not be disabled, rightly so, when submit async search did not return them within the timeout.
I considered removing this parameter, because when results have been returned, they could be stored externally. It turns out though that this parameter is useful to make testing deterministic and it makes sense to keep it.
Proposal: rename it tokeep_resultsand make it anenumrather than abooleanwith two possible values:auto(the default behaviour: store results unless submit async search returned them withinkeep_results_timeout) andalways(store results for later retrieval even if they have been returned by submit async search within the provided timeout). I find that this better reflects the behaviour of the API and aligns well with the above proposed rename ofkeep_alivetokeep_results_timeoutas they are somehow related. Note thatalwaysdoes not mean forever, the results will always be cleaned when their validity expires. -
The rename of
wait_for_completiontowait_for_results_timeoutshould also be applied to the get async search API, but maybe we should consider whether this parameter is useful when retrieving results? Users that are calling get async search are taking advantage of the async nature of async search, hence while I see why one would block and wait when submitting, I don't see why one would block and wait when retrieving results. Is avoiding an additional call when the search is almost complete a good enough reason to expose this parameter?