-
Notifications
You must be signed in to change notification settings - Fork 739
feat: parallelized for-loop in lfilter #1557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
update to new version
Accommodate changes in lfilter
Rebase to master
Get synced
|
|
Hi @yoyololicon Thanks for working on this. And thanks for benchmarking it as well. |
|
I just type |
I do not have a definitive answer. Probably you can try passing something like If you check the configuration of your PyTorch, then you might get a clue. I think in the above case I need to install Intel OpenMP and pass the compilation flag. |
|
BTW: On the separate note, I was wondering, since the core of |
Thanks for the informations, will look into this later.
I assume you are talking about the FIR part of the |
|
After adding set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -fopenmp ${TORCH_CXX_FLAGS}")It does gain some speed, especially when number of threads = 4; when threads = 1 it introduces some extra overhead. |
|
Hi @yoyololicon I recently learned that |
@mthrok |
Hi @yoyololicon Sorry for the late response. You are right. the code inside of |
|
Can you resolve the conflict? |
|
@yoyololicon Thanks! |
relates to #1476.
Replace the for-loop along channel dimension with
at::parallel_forshould gain some speed.To benchmark the changes, one can use the same script I presented in #1441 (comment)