server bench: fix bench not waiting for model load #7284

JohannesGaessler · 2024-05-14T13:34:52Z

While working on #6828 I noticed that when using a large static n-ngam cache the benchmark would report 0 iterations for the first 8 minutes and then 30 iterations for the last 2 minutes. What seems to be happening is that bench.py doesn't correctly wait for the server to be ready so the clock starts ticking even while the n-gram cache is still being loaded. From what I can tell loading the model from disk can have the same issue if it's e.g. on an HDD.

This PR makes it so that bench.py waits for response 200 (SERVER_STATE_READY) from the health endpoint for checking whether the server is actually ready. I'm not sure if there is a better way to implement this than what I did; I'm definitely open to suggestions.

ggerganov · 2024-05-16T14:42:21Z

It looks like this change causes the server Benchmark that we run on the self-hosted runner to fail like this:

https://github.com/ggerganov/llama.cpp/actions/runs/9094073377/job/24998422481

I tried to revert it and now the benchmark passes:

https://github.com/ggerganov/llama.cpp/actions/runs/9112533114

I'm not sure why it is causing the error - any ideas how to fix?

phymbert · 2024-05-16T18:09:05Z

Yes, the problem is here:

https://github.com/ggerganov/llama.cpp/blob/9afdffe70ebf3166d429b4434783bb0b7f97bdeb/examples/server/bench/bench.py#L113

It considers prometheus not started, which is not working as expected. Probably easier to revert and separate in another PR prometheus check vs llama.cpp server checks ?

This reverts commit 583fd6b.

…7334) This reverts commit 583fd6b.

server bench: fix bench not waiting for model load

f692dbd

JohannesGaessler requested a review from phymbert May 14, 2024 13:34

phymbert approved these changes May 14, 2024

View reviewed changes

mofosyne added examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix python python script changes labels May 14, 2024

JohannesGaessler merged commit 583fd6b into ggml-org:master May 15, 2024

phymbert added a commit that referenced this pull request May 16, 2024

Revert "server bench: fix bench not waiting for model load (#7284)"

e7f7bef

This reverts commit 583fd6b.

phymbert mentioned this pull request May 16, 2024

Revert "server bench: fix bench not waiting for model load" #7334

Merged

phymbert added a commit that referenced this pull request May 16, 2024

Revert "server bench: fix bench not waiting for model load (#7284)" (#…

24ecb58

…7334) This reverts commit 583fd6b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server bench: fix bench not waiting for model load #7284

server bench: fix bench not waiting for model load #7284

Uh oh!

JohannesGaessler commented May 14, 2024

Uh oh!

ggerganov commented May 16, 2024

Uh oh!

phymbert commented May 16, 2024

Uh oh!

Uh oh!

server bench: fix bench not waiting for model load #7284

server bench: fix bench not waiting for model load #7284

Uh oh!

Conversation

JohannesGaessler commented May 14, 2024

Uh oh!

ggerganov commented May 16, 2024

Uh oh!

phymbert commented May 16, 2024

Uh oh!

Uh oh!