Skip to content

Conversation

JohannesGaessler
Copy link
Collaborator

While working on #6828 I noticed that when using a large static n-ngam cache the benchmark would report 0 iterations for the first 8 minutes and then 30 iterations for the last 2 minutes. What seems to be happening is that bench.py doesn't correctly wait for the server to be ready so the clock starts ticking even while the n-gram cache is still being loaded. From what I can tell loading the model from disk can have the same issue if it's e.g. on an HDD.

This PR makes it so that bench.py waits for response 200 (SERVER_STATE_READY) from the health endpoint for checking whether the server is actually ready. I'm not sure if there is a better way to implement this than what I did; I'm definitely open to suggestions.

@JohannesGaessler JohannesGaessler requested a review from phymbert May 14, 2024 13:34
@mofosyne mofosyne added examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix python python script changes labels May 14, 2024
@JohannesGaessler JohannesGaessler merged commit 583fd6b into ggml-org:master May 15, 2024
@ggerganov
Copy link
Member

It looks like this change causes the server Benchmark that we run on the self-hosted runner to fail like this:

https://github.com/ggerganov/llama.cpp/actions/runs/9094073377/job/24998422481

I tried to revert it and now the benchmark passes:

https://github.com/ggerganov/llama.cpp/actions/runs/9112533114

I'm not sure why it is causing the error - any ideas how to fix?

@phymbert
Copy link
Collaborator

Yes, the problem is here:

https://github.com/ggerganov/llama.cpp/blob/9afdffe70ebf3166d429b4434783bb0b7f97bdeb/examples/server/bench/bench.py#L113

It considers prometheus not started, which is not working as expected. Probably easier to revert and separate in another PR prometheus check vs llama.cpp server checks ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants