Skip to content

Commit 083cf32

Browse files
[Doc]: fix typos in various files (#28863)
Signed-off-by: Didier Durand <[email protected]>
1 parent bf9e1e8 commit 083cf32

File tree

6 files changed

+7
-7
lines changed

6 files changed

+7
-7
lines changed

docs/contributing/profiling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,6 @@ snakeviz expensive_function.prof
224224

225225
Leverage VLLM_GC_DEBUG environment variable to debug GC costs.
226226

227-
- VLLM_GC_DEBUG=1: enable GC debugger with gc.collect elpased times
227+
- VLLM_GC_DEBUG=1: enable GC debugger with gc.collect elapsed times
228228
- VLLM_GC_DEBUG='{"top_objects":5}': enable GC debugger to log top 5
229229
collected objects for each gc.collect

docs/design/io_processor_plugins.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# IO Processor Plugins
22

3-
IO Processor plugins are a feature that allows pre and post processing of the model input and output for pooling models. The idea is that users are allowed to pass a custom input to vLLM that is converted into one or more model prompts and fed to the model `encode` method. One potential use-case of such plugins is that of using vLLM for generating multi-modal data. Say users feed an image to vLLM and get an image in output.
3+
IO Processor plugins are a feature that allows pre- and post-processing of the model input and output for pooling models. The idea is that users are allowed to pass a custom input to vLLM that is converted into one or more model prompts and fed to the model `encode` method. One potential use-case of such plugins is that of using vLLM for generating multi-modal data. Say users feed an image to vLLM and get an image in output.
44

55
When performing an inference with IO Processor plugins, the prompt type is defined by the plugin and the same is valid for the final request output. vLLM does not perform any validation of input/output data, and it is up to the plugin to ensure the correct data is being fed to the model and returned to the user. As of now these plugins support only pooling models and can be triggered via the `encode` method in `LLM` and `AsyncLLM`, or in online serving mode via the `/pooling` endpoint.
66

docs/design/logits_processors.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -411,7 +411,7 @@ Logits processor `update_state()` implementations should assume the following mo
411411
412412
* **"Condense" the batch to be contiguous:** starting with the lowest-index empty slot (which was caused by a Remove), apply a Unidirectional Move from the current highest non-empty slot in the batch to fill the empty slot. Proceed with additional Unidirectional Move operations in order of increasing empty slot destination index and decreasing non-empty slot source index until the batch is contiguous
413413
414-
* **Shrink the batch:** a side-effect of condensing the batch is that empty slots resulting from Remove operations are grouped in a contiguous block at the end of the batch array. Thus, after condensing, update `BatchUpdate.batch_size` to reflect the number of non-empty slots
414+
* **Shrink the batch:** a side effect of condensing the batch is that empty slots resulting from Remove operations are grouped in a contiguous block at the end of the batch array. Thus, after condensing, update `BatchUpdate.batch_size` to reflect the number of non-empty slots
415415
416416
5. Reorder the batch for improved efficiency. Depending on the attention backend implementation and the current characteristics of the batch, zero or more Swap Move operations may be applied to reorder the batch
417417
@@ -548,7 +548,7 @@ Built-in logits processors are always loaded when the vLLM engine starts. See th
548548

549549
Review these logits processor implementations for guidance on writing built-in logits processors.
550550

551-
Additionally, the following logits-processor-like functionalities are hard-coded into the sampler and do not yet utilize the programming model described above. Most of them will be refactored to use the aforemented logits processor programming model.
551+
Additionally, the following logits-processor-like functionalities are hard-coded into the sampler and do not yet utilize the programming model described above. Most of them will be refactored to use the aforementioned logits processor programming model.
552552

553553
* Allowed token IDs
554554

docs/features/disagg_prefill.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,6 @@ Disaggregated prefilling is highly related to infrastructure, so vLLM relies on
9191

9292
We recommend three ways of implementations:
9393

94-
- **Fully-customized connector**: Implement your own `Connector`, and call third-party libraries to send and receive KV caches, and many many more (like editing vLLM's model input to perform customized prefilling, etc). This approach gives you the most control, but at the risk of being incompatible with future vLLM versions.
94+
- **Fully-customized connector**: Implement your own `Connector`, and call third-party libraries to send and receive KV caches, and many many more (like editing vLLM's model input to perform customized prefilling, etc.). This approach gives you the most control, but at the risk of being incompatible with future vLLM versions.
9595
- **Database-like connector**: Implement your own `LookupBuffer` and support the `insert` and `drop_select` APIs just like SQL.
9696
- **Distributed P2P connector**: Implement your own `Pipe` and support the `send_tensor` and `recv_tensor` APIs, just like `torch.distributed`.

docs/features/lora.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This document shows you how to use [LoRA adapters](https://arxiv.org/abs/2106.09
44

55
LoRA adapters can be used with any vLLM model that implements [SupportsLoRA][vllm.model_executor.models.interfaces.SupportsLoRA].
66

7-
Adapters can be efficiently served on a per request basis with minimal overhead. First we download the adapter(s) and save
7+
Adapters can be efficiently served on a per-request basis with minimal overhead. First we download the adapter(s) and save
88
them locally with
99

1010
```python

vllm/lora/ops/triton_ops/fused_moe_lora_op.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ def _fused_moe_lora_kernel(
154154
k_remaining = K - k * (BLOCK_SIZE_K * SPLIT_K)
155155
# pre-fetch lora weight
156156
b = tl.load(b_ptrs, mask=offs_k[:, None] < k_remaining, other=0.0)
157-
# GDC wait waits for ALL programs in the the prior kernel to complete
157+
# GDC wait waits for ALL programs in the prior kernel to complete
158158
# before continuing.
159159
if USE_GDC and not IS_PRIMARY:
160160
tl.extra.cuda.gdc_wait()

0 commit comments

Comments
 (0)