Skip to content

Commit efbde5d

Browse files
dmeissdanielmitterdorfer
authored andcommitted
Edits to text in Update By Query API doc (#39078)
1 parent 763fde7 commit efbde5d

File tree

1 file changed

+30
-29
lines changed

1 file changed

+30
-29
lines changed

docs/reference/docs/update-by-query.asciidoc

Lines changed: 30 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ That will return something like this:
3939
// TESTRESPONSE[s/"took" : 147/"took" : "$body.took"/]
4040

4141
`_update_by_query` gets a snapshot of the index when it starts and indexes what
42-
it finds using `internal` versioning. That means that you'll get a version
42+
it finds using `internal` versioning. That means you'll get a version
4343
conflict if the document changes between the time when the snapshot was taken
44-
and when the index request is processed. When the versions match the document
44+
and when the index request is processed. When the versions match, the document
4545
is updated and the version number is incremented.
4646

4747
NOTE: Since `internal` versioning does not support the value 0 as a valid
@@ -55,10 +55,10 @@ aborted. While the first failure causes the abort, all failures that are
5555
returned by the failing bulk request are returned in the `failures` element; therefore
5656
it's possible for there to be quite a few failed entities.
5757

58-
If you want to simply count version conflicts not cause the `_update_by_query`
59-
to abort you can set `conflicts=proceed` on the url or `"conflicts": "proceed"`
58+
If you want to simply count version conflicts, and not cause the `_update_by_query`
59+
to abort, you can set `conflicts=proceed` on the url or `"conflicts": "proceed"`
6060
in the request body. The first example does this because it is just trying to
61-
pick up an online mapping change and a version conflict simply means that the
61+
pick up an online mapping change, and a version conflict simply means that the
6262
conflicting document was updated between the start of the `_update_by_query`
6363
and the time when it attempted to update the document. This is fine because
6464
that update will have picked up the online mapping update.
@@ -92,7 +92,7 @@ POST twitter/_update_by_query?conflicts=proceed
9292

9393
<1> The query must be passed as a value to the `query` key, in the same
9494
way as the <<search-search,Search API>>. You can also use the `q`
95-
parameter in the same way as the search api.
95+
parameter in the same way as the search API.
9696

9797
So far we've only been updating documents without changing their source. That
9898
is genuinely useful for things like
@@ -121,7 +121,7 @@ POST twitter/_update_by_query
121121
Just as in <<docs-update,Update API>> you can set `ctx.op` to change the
122122
operation that is executed:
123123

124-
124+
[horizontal]
125125
`noop`::
126126

127127
Set `ctx.op = "noop"` if your script decides that it doesn't have to make any
@@ -199,12 +199,12 @@ POST twitter/_update_by_query?pipeline=set-foo
199199
=== URL Parameters
200200

201201
In addition to the standard parameters like `pretty`, the Update By Query API
202-
also supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, `timeout`
202+
also supports `refresh`, `wait_for_completion`, `wait_for_active_shards`, `timeout`,
203203
and `scroll`.
204204

205205
Sending the `refresh` will update all shards in the index being updated when
206206
the request completes. This is different than the Update API's `refresh`
207-
parameter which causes just the shard that received the new data to be indexed.
207+
parameter, which causes just the shard that received the new data to be indexed.
208208
Also unlike the Update API it does not support `wait_for`.
209209

210210
If the request contains `wait_for_completion=false` then Elasticsearch will
@@ -219,12 +219,12 @@ Elasticsearch can reclaim the space it uses.
219219
before proceeding with the request. See <<index-wait-for-active-shards,here>>
220220
for details. `timeout` controls how long each write request waits for unavailable
221221
shards to become available. Both work exactly how they work in the
222-
<<docs-bulk,Bulk API>>. As `_update_by_query` uses scroll search, you can also specify
222+
<<docs-bulk,Bulk API>>. Because `_update_by_query` uses scroll search, you can also specify
223223
the `scroll` parameter to control how long it keeps the "search context" alive,
224-
eg `?scroll=10m`, by default it's 5 minutes.
224+
e.g. `?scroll=10m`. By default it's 5 minutes.
225225

226226
`requests_per_second` can be set to any positive decimal number (`1.4`, `6`,
227-
`1000`, etc) and throttles rate at which `_update_by_query` issues batches of
227+
`1000`, etc.) and throttles the rate at which `_update_by_query` issues batches of
228228
index operations by padding each batch with a wait time. The throttling can be
229229
disabled by setting `requests_per_second` to `-1`.
230230

@@ -240,7 +240,7 @@ target_time = 1000 / 500 per second = 2 seconds
240240
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
241241
--------------------------------------------------
242242

243-
Since the batch is issued as a single `_bulk` request large batch sizes will
243+
Since the batch is issued as a single `_bulk` request, large batch sizes will
244244
cause Elasticsearch to create many requests and then wait for a while before
245245
starting the next set. This is "bursty" instead of "smooth". The default is `-1`.
246246

@@ -283,6 +283,7 @@ The JSON response looks like this:
283283
--------------------------------------------------
284284
// TESTRESPONSE[s/"took" : 147/"took" : "$body.took"/]
285285

286+
[horizontal]
286287
`took`::
287288

288289
The number of milliseconds from start to end of the whole operation.
@@ -319,8 +320,8 @@ the update by query returned a `noop` value for `ctx.op`.
319320

320321
`retries`::
321322

322-
The number of retries attempted by update-by-query. `bulk` is the number of bulk
323-
actions retried and `search` is the number of search actions retried.
323+
The number of retries attempted by update by query. `bulk` is the number of bulk
324+
actions retried, and `search` is the number of search actions retried.
324325

325326
`throttled_millis`::
326327

@@ -341,8 +342,8 @@ executed again in order to conform to `requests_per_second`.
341342

342343
Array of failures if there were any unrecoverable errors during the process. If
343344
this is non-empty then the request aborted because of those failures.
344-
Update-by-query is implemented using batches and any failure causes the entire
345-
process to abort but all failures in the current batch are collected into the
345+
Update by query is implemented using batches. Any failure causes the entire
346+
process to abort, but all failures in the current batch are collected into the
346347
array. You can use the `conflicts` option to prevent reindex from aborting on
347348
version conflicts.
348349

@@ -352,7 +353,7 @@ version conflicts.
352353
[[docs-update-by-query-task-api]]
353354
=== Works with the Task API
354355

355-
You can fetch the status of all running update-by-query requests with the
356+
You can fetch the status of all running update by query requests with the
356357
<<tasks,Task API>>:
357358

358359
[source,js]
@@ -406,7 +407,7 @@ The responses looks like:
406407
--------------------------------------------------
407408
// TESTRESPONSE
408409

409-
<1> this object contains the actual status. It is just like the response json
410+
<1> This object contains the actual status. It is just like the response JSON
410411
with the important addition of the `total` field. `total` is the total number
411412
of operations that the reindex expects to perform. You can estimate the
412413
progress by adding the `updated`, `created`, and `deleted` fields. The request
@@ -424,7 +425,7 @@ GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619
424425

425426
The advantage of this API is that it integrates with `wait_for_completion=false`
426427
to transparently return the status of completed tasks. If the task is completed
427-
and `wait_for_completion=false` was set on it them it'll come back with a
428+
and `wait_for_completion=false` was set on it, then it'll come back with a
428429
`results` or an `error` field. The cost of this feature is the document that
429430
`wait_for_completion=false` creates at `.tasks/task/${taskId}`. It is up to
430431
you to delete that document.
@@ -434,7 +435,7 @@ you to delete that document.
434435
[[docs-update-by-query-cancel-task-api]]
435436
=== Works with the Cancel Task API
436437

437-
Any Update By Query can be canceled using the <<tasks,Task Cancel API>>:
438+
Any update by query can be cancelled using the <<tasks,Task Cancel API>>:
438439

439440
[source,js]
440441
--------------------------------------------------
@@ -464,25 +465,25 @@ POST _update_by_query/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_seco
464465

465466
The task ID can be found using the <<tasks, tasks API>>.
466467

467-
Just like when setting it on the `_update_by_query` API `requests_per_second`
468+
Just like when setting it on the `_update_by_query` API, `requests_per_second`
468469
can be either `-1` to disable throttling or any decimal number
469470
like `1.7` or `12` to throttle to that level. Rethrottling that speeds up the
470-
query takes effect immediately but rethrotting that slows down the query will
471-
take effect on after completing the current batch. This prevents scroll
471+
query takes effect immediately, but rethrotting that slows down the query will
472+
take effect after completing the current batch. This prevents scroll
472473
timeouts.
473474

474475
[float]
475476
[[docs-update-by-query-slice]]
476477
=== Slicing
477478

478-
Update-by-query supports <<sliced-scroll>> to parallelize the updating process.
479+
Update by query supports <<sliced-scroll>> to parallelize the updating process.
479480
This parallelization can improve efficiency and provide a convenient way to
480481
break the request down into smaller parts.
481482

482483
[float]
483484
[[docs-update-by-query-manual-slice]]
484485
==== Manual slicing
485-
Slice an update-by-query manually by providing a slice id and total number of
486+
Slice an update by query manually by providing a slice id and total number of
486487
slices to each request:
487488

488489
[source,js]
@@ -537,7 +538,7 @@ Which results in a sensible `total` like this one:
537538
[[docs-update-by-query-automatic-slice]]
538539
==== Automatic slicing
539540

540-
You can also let update-by-query automatically parallelize using
541+
You can also let update by query automatically parallelize using
541542
<<sliced-scroll>> to slice on `_uid`. Use `slices` to specify the number of
542543
slices to use:
543544

@@ -599,8 +600,8 @@ be larger than others. Expect larger slices to have a more even distribution.
599600
are distributed proportionally to each sub-request. Combine that with the point
600601
above about distribution being uneven and you should conclude that the using
601602
`size` with `slices` might not result in exactly `size` documents being
602-
`_update_by_query`ed.
603-
* Each sub-requests gets a slightly different snapshot of the source index
603+
updated.
604+
* Each sub-request gets a slightly different snapshot of the source index
604605
though these are all taken at approximately the same time.
605606

606607
[float]

0 commit comments

Comments
 (0)