Skip to content

findall has strange behavior for impure predicates on Bool Arrays #46425

@LilithHafner

Description

@LilithHafner
julia> f(x) = x || rand(Bool);

julia> a = [false, false, true];

julia> Random.seed!(10);

julia> findall(f, a)
2-element Vector{Int64}:
 1
 2
   # Where is three?
julia> findall(f, a)
3-element Vector{Int64}:
 2
 3 # There it is!
 4 # Oh, hello four, I didn't expect to see you here.

julia> versioninfo() # 25 days ago
Julia Version 1.9.0-DEV.1053
Commit 9e22e567f29 (2022-07-26 14:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.5.0)
  CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.5 (ORCJIT, skylake)
  Threads: 1 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/lib
  JULIA_PKG_PRECOMPILE_AUTO = 0

Not returning 3 is bad. Returing 4 could lead to segfaults and is definitely a bug.

We have two implementations of findall(f::Function, a::AbstractArray)

  1. Precompute f.(a) and then get indices from the BitArray
    findall(testf::Function, A::AbstractArray) = findall(testf.(A))
  2. Count count(f, a), preallocate an index array, and iterate through a second time recomputing f(a)

    julia/base/array.jl

    Lines 2375 to 2393 in aac466f

    function _findall(f::Function, A::AbstractArray{Bool})
    n = count(f, A)
    I = Vector{eltype(keys(A))}(undef, n)
    isempty(I) && return I
    _findall(f, I, A)
    end
    function _findall(f::Function, I::Vector, A::AbstractArray{Bool})
    cnt = 1
    len = length(I)
    for (k, v) in pairs(A)
    @inbounds I[cnt] = k
    cnt += f(v)
    cnt > len && return I
    end
    # In case of impure f, this line could potentially be hit. In that case,
    # we can't assume I is the correct length.
    resize!(I, cnt - 1)
    end

For simple f, 2 is about 1.5x faster according to my rough benchmarks. If the runtime of f dominates, then 1 should be 2x faster. If f is impure then 1 behaves how one would expect and 2 can have bizarre consequences.

Ideally, we dispatch to 2 for simple pure f and 1 otherwise. Our current heuristic is to dispatch to 2 for a::AbstractArray{Bool} and 1 otherwise. This is a bad heuristic. It would be cool to dispatch based on effect analysis, but if that is not an option, my preference is to use f === identity as the heuristic (even though this is a performance hit in some cases).

I suspect this was introduced by #42202 (cc @jakobnissen) which fixed #42187.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backport 1.9Change should be backported to release-1.9bugIndicates an unexpected problem or unintended behaviorcorrectness bug ⚠Bugs that are likely to lead to incorrect results in user code without throwingregressionRegression in behavior compared to a previous versionsearch & findThe find* family of functions

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions