embeddings : fix batch sizes #13076

ggerganov · 2025-04-23T08:10:09Z

Embedding models require to fit the entire prompt in a single micro batch, so we simply force:

n_ubatch = n_batch = n_ctx

ggml-ci

embeddings : fix batch sizes

9fe0a29

ggml-ci

ggerganov mentioned this pull request Apr 23, 2025

Misc. bug: llama-embedding asserts: GGML_ASSERT(params.n_batch >= params.n_ctx); #12860

Closed

github-actions bot added the examples label Apr 23, 2025

ggerganov merged commit 226251e into master Apr 24, 2025
56 checks passed

ggerganov deleted the gg/embeddings-fix-batch branch April 24, 2025 19:29

pockers21 pushed a commit to pockers21/llama.cpp that referenced this pull request Apr 28, 2025

embeddings : fix batch sizes (ggml-org#13076)

0414304

ggml-ci

thxCode mentioned this pull request Jul 3, 2025

Performance differences between Ollama and gpustack when running embedding model gpustack/gpustack#1384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

embeddings : fix batch sizes #13076

embeddings : fix batch sizes #13076

Uh oh!

ggerganov commented Apr 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

embeddings : fix batch sizes #13076

embeddings : fix batch sizes #13076

Uh oh!

Conversation

ggerganov commented Apr 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants