Skip to content

Conversation

@eaplatanios
Copy link
Contributor

This was broken in the 0.5.3 release when these signal calls were introduced. It results in the following error when deploying on our machines:

ValueError: signal only works in main thread of the main interpreter

The fix is borrowed from here.

It would be great if you could cut a 0.5.3.post2 release after this is merged to unblock us (and I assume others as well) from using vLLM with Llama 3.1. Thank you!

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@eaplatanios eaplatanios changed the title [Easy] Fixed a bug in the multiprocessing GPU executor. [Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor. Jul 25, 2024
@youkaichao
Copy link
Member

Hi, why would this be a problem? When would this code be called from another thread?

@eaplatanios
Copy link
Contributor Author

eaplatanios commented Jul 25, 2024

We are using uvicorn to start our service (with no concurrency) and initializing a vLLM engine using vllm.AsyncLLMEngine.from_engine_args() so nothing special. I was thinking that probably vLLM behind the scenes starts multiple processes and then they all run some initialization code including this part perhaps but I haven't looked into the vLLM internals to know if that's the case.

@eaplatanios
Copy link
Contributor Author

I looked into this a bit more and the main issue is that we have a FastAPI service and we are initializing the vLLM engine in a separate thread so that the API can still respond with something like "Model is being loaded" while the model is being loaded etc., which can take a while for some models. This looks something like this:

loop = asyncio.get_event_loop()
loop.set_default_executor(ThreadPoolExecutor())
model_download_task = loop.run_in_executor(None, load_model)
model_download_task.add_done_callback(done_callback)

where load_model is a function that internally constructs a vLLM async engine at some point. Do you have any advice for how to work around this issue? It only popped up now because these signal handlers were added in the0.5.3 release.

@eaplatanios
Copy link
Contributor Author

@youkaichao I believe that the change introduced in this PR should be sufficient to address this use case and innocuous for other use cases but I'm not sure if there are consequences I'm not aware of.

@youkaichao
Copy link
Member

Got it. So you are indeed calling it from another thread. This is not the common usecase.

I can accept this change, but since this is not a common usage, we will not make a release just for it. It can be in the next release.

@eaplatanios
Copy link
Contributor Author

That is understandable, thank you! Do you have an estimated timeline for the next release?

@youkaichao
Copy link
Member

We are usually in a bi-weekly release cadence.

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@youkaichao youkaichao merged commit 084a01f into vllm-project:main Jul 26, 2024
@eaplatanios
Copy link
Contributor Author

Sounds good, thank you!

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants