Skip to content

Conversation

salbert83
Copy link
Contributor

I guess one could multithread application of a forest to a vector instead. I think it is better to do it at this level.

@codecov-commenter
Copy link

codecov-commenter commented Jun 18, 2022

Codecov Report

Merging #175 (bd1a773) into master (7e090bb) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #175   +/-   ##
=======================================
  Coverage   89.51%   89.51%           
=======================================
  Files          10       10           
  Lines         992      992           
=======================================
  Hits          888      888           
  Misses        104      104           
Impacted Files Coverage Δ
src/classification/main.jl 97.56% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e090bb...bd1a773. Read the comment docs.

@ablaom ablaom requested a review from OkonSamuel June 20, 2022 04:06
@ablaom
Copy link
Member

ablaom commented Jun 20, 2022

For prediction in a forest I agree that it makes sense to always use multithreading. @OkonSamuel Your thoughts? (And can you please review?)

There are possiblities for multithreading in training both forests and an individual tree, but let's leave that for a separate PR. In that case we might want to make the mode of acceleration switchable (between CPU1() and CPUThreads()).l

@@ -271,7 +271,7 @@ end
function apply_forest(forest::Ensemble{S, T}, features::AbstractMatrix{S}) where {S, T}
N = size(features,1)
predictions = Array{T}(undef, N)
for i in 1:N
Threads.@threads for i in 1:N
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check for the case of Threads.nthreads() == 1. For this case due to task overhead single threaded implementation is better.

@@ -271,7 +271,7 @@ end
function apply_forest(forest::Ensemble{S, T}, features::AbstractMatrix{S}) where {S, T}
N = size(features,1)
predictions = Array{T}(undef, N)
for i in 1:N
Threads.@threads for i in 1:N
predictions[i] = apply_forest(forest, features[i, :])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: With the current implementation, the speed improvements from using multithreading will come with significantly increased memory imprint especially for large forests. We could re-write the existing codebase to reduce this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these criticisms apply similarly to the current use of multithreading in building a forest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these criticisms apply similarly to the current use of multithreading in building a forest?

Yes. The current implementation might be allocation heavy. Adding multithreading affects this.

Copy link
Member

@OkonSamuel OkonSamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check for the case of Threads.nthreads() == 1. For this case due to task overhead single threaded implementation is better.
Other than this, LGTM.
I think there is room for improving performance, but this can be addressed later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants