findall has strange behavior for impure predicates on Bool Arrays

```julia
julia> f(x) = x || rand(Bool);

julia> a = [false, false, true];

julia> Random.seed!(10);

julia> findall(f, a)
2-element Vector{Int64}:
 1
 2
   # Where is three?
julia> findall(f, a)
3-element Vector{Int64}:
 2
 3 # There it is!
 4 # Oh, hello four, I didn't expect to see you here.

julia> versioninfo() # 25 days ago
Julia Version 1.9.0-DEV.1053
Commit 9e22e567f29 (2022-07-26 14:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.5.0)
  CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.5 (ORCJIT, skylake)
  Threads: 1 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/lib
  JULIA_PKG_PRECOMPILE_AUTO = 0
```
Not returning `3` is bad. Returing `4` could lead to segfaults and is definitely a bug.

We have two implementations of `findall(f::Function, a::AbstractArray)`
1) Precompute `f.(a)` and then get indices from the `BitArray`
https://github.com/JuliaLang/julia/blob/aac466fcbcadbe8a9c101dee19c8eddecfdfdcd7/base/array.jl#L2328
2) Count `count(f, a)`, preallocate an index array, and iterate through a second time recomputing `f(a)`
https://github.com/JuliaLang/julia/blob/aac466fcbcadbe8a9c101dee19c8eddecfdfdcd7/base/array.jl#L2375-L2393

For simple `f`, 2 is about 1.5x faster according to my rough benchmarks. If the runtime of `f` dominates, then 1 should be 2x faster. If `f` is impure then 1 behaves how one would expect and 2 can have bizarre consequences.

Ideally, we dispatch to 2 for simple pure `f` and 1 otherwise. Our current heuristic is to dispatch to 2 for `a::AbstractArray{Bool}` and 1 otherwise. This is a bad heuristic. It would be cool to dispatch based on effect analysis, but if that is not an option, my preference is to use `f === identity` as the heuristic (even though this is a performance hit in some cases).

I suspect this was introduced by #42202 (cc @jakobnissen) which fixed #42187.

	function _findall(f::Function, A::AbstractArray{Bool})
	n = count(f, A)
	I = Vector{eltype(keys(A))}(undef, n)
	isempty(I) && return I
	_findall(f, I, A)
	end

	function _findall(f::Function, I::Vector, A::AbstractArray{Bool})
	cnt = 1
	len = length(I)
	for (k, v) in pairs(A)
	@inbounds I[cnt] = k
	cnt += f(v)
	cnt > len && return I
	end
	# In case of impure f, this line could potentially be hit. In that case,
	# we can't assume I is the correct length.
	resize!(I, cnt - 1)
	end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

findall has strange behavior for impure predicates on Bool Arrays #46425

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

findall has strange behavior for impure predicates on Bool Arrays #46425

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions