-
Notifications
You must be signed in to change notification settings - Fork 102
Multithread forest application to matrices #175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #175 +/- ##
=======================================
Coverage 89.51% 89.51%
=======================================
Files 10 10
Lines 992 992
=======================================
Hits 888 888
Misses 104 104
Continue to review full report at Codecov.
|
For prediction in a forest I agree that it makes sense to always use multithreading. @OkonSamuel Your thoughts? (And can you please review?) There are possiblities for multithreading in training both forests and an individual tree, but let's leave that for a separate PR. In that case we might want to make the mode of acceleration switchable (between |
@@ -271,7 +271,7 @@ end | |||
function apply_forest(forest::Ensemble{S, T}, features::AbstractMatrix{S}) where {S, T} | |||
N = size(features,1) | |||
predictions = Array{T}(undef, N) | |||
for i in 1:N | |||
Threads.@threads for i in 1:N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check for the case of Threads.nthreads() == 1
. For this case due to task overhead single threaded implementation is better.
@@ -271,7 +271,7 @@ end | |||
function apply_forest(forest::Ensemble{S, T}, features::AbstractMatrix{S}) where {S, T} | |||
N = size(features,1) | |||
predictions = Array{T}(undef, N) | |||
for i in 1:N | |||
Threads.@threads for i in 1:N | |||
predictions[i] = apply_forest(forest, features[i, :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: With the current implementation, the speed improvements from using multithreading will come with significantly increased memory imprint especially for large forests. We could re-write the existing codebase to reduce this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these criticisms apply similarly to the current use of multithreading in building a forest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these criticisms apply similarly to the current use of multithreading in building a forest?
Yes. The current implementation might be allocation heavy. Adding multithreading affects this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check for the case of Threads.nthreads() == 1
. For this case due to task overhead single threaded implementation is better.
Other than this, LGTM.
I think there is room for improving performance, but this can be addressed later.
I guess one could multithread application of a forest to a vector instead. I think it is better to do it at this level.