-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
julia> f(x) = x || rand(Bool);
julia> a = [false, false, true];
julia> Random.seed!(10);
julia> findall(f, a)
2-element Vector{Int64}:
1
2
# Where is three?
julia> findall(f, a)
3-element Vector{Int64}:
2
3 # There it is!
4 # Oh, hello four, I didn't expect to see you here.
julia> versioninfo() # 25 days ago
Julia Version 1.9.0-DEV.1053
Commit 9e22e567f29 (2022-07-26 14:21 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.5.0)
CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.5 (ORCJIT, skylake)
Threads: 1 on 2 virtual cores
Environment:
LD_LIBRARY_PATH = /usr/local/lib
JULIA_PKG_PRECOMPILE_AUTO = 0
Not returning 3
is bad. Returing 4
could lead to segfaults and is definitely a bug.
We have two implementations of findall(f::Function, a::AbstractArray)
- Precompute
f.(a)
and then get indices from theBitArray
Line 2328 in aac466f
findall(testf::Function, A::AbstractArray) = findall(testf.(A)) - Count
count(f, a)
, preallocate an index array, and iterate through a second time recomputingf(a)
Lines 2375 to 2393 in aac466f
function _findall(f::Function, A::AbstractArray{Bool}) n = count(f, A) I = Vector{eltype(keys(A))}(undef, n) isempty(I) && return I _findall(f, I, A) end function _findall(f::Function, I::Vector, A::AbstractArray{Bool}) cnt = 1 len = length(I) for (k, v) in pairs(A) @inbounds I[cnt] = k cnt += f(v) cnt > len && return I end # In case of impure f, this line could potentially be hit. In that case, # we can't assume I is the correct length. resize!(I, cnt - 1) end
For simple f
, 2 is about 1.5x faster according to my rough benchmarks. If the runtime of f
dominates, then 1 should be 2x faster. If f
is impure then 1 behaves how one would expect and 2 can have bizarre consequences.
Ideally, we dispatch to 2 for simple pure f
and 1 otherwise. Our current heuristic is to dispatch to 2 for a::AbstractArray{Bool}
and 1 otherwise. This is a bad heuristic. It would be cool to dispatch based on effect analysis, but if that is not an option, my preference is to use f === identity
as the heuristic (even though this is a performance hit in some cases).
I suspect this was introduced by #42202 (cc @jakobnissen) which fixed #42187.