Skip to content

Conversation

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Oct 28, 2025

Closes #7746
Closes #7077

It turns out that our application of if_any() and if_all() was fairly inconsistent. This is due to the fact that they are tricky to get right, we have 2 different implementations of them. One for the expansion case and one for the evaluation case. I've now tried to unify these to use the same underlying implementation, dplyr_list_pany() or dplyr_list_pall() depending on the scenario (which just use vec_pany() and vec_pall() under the hood)

In addition to greater consistency across the board, you'll also note that in errors the In argument: label also now reports the original expression pre-expansion in the filter() cases, which is a much better error

In the examples below, for filter(), note that adding () around the if_any() or if_all() calls triggers the evaluation case rather than the expansion case.

library(dplyr)

# With zero inputs, if_any

# Before
df <- tibble(x = 1:2)
filter(df, if_any(c(), identity))
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>
filter(df, (if_any(c(), identity)))
#> # A tibble: 2 × 1
#>       x
#>   <int>
#> 1     1
#> 2     2
filter(df, any())
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>

# After
df <- tibble(x = 1:2)
filter(df, if_any(c(), identity))
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>
filter(df, (if_any(c(), identity)))
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>
filter(df, any())
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>
# With one non-logical input

# Before
df <- tibble(x = 1:2)
filter(df, if_any(x, identity))
#> Error in `filter()`:
#> ℹ In argument: `(function (x) ...`.
#> Caused by error:
#> ! `..1` must be a logical vector, not an integer vector.
filter(df, (if_any(x, identity)))
#> Error in `filter()`:
#> ℹ In argument: `(if_any(x, identity))`.
#> Caused by error:
#> ! `..1` must be a logical vector, not an integer vector.
mutate(df, a = if_any(x, identity))
#> # A tibble: 2 × 2
#>       x     a
#>   <int> <int>
#> 1     1     1
#> 2     2     2

# After
df <- tibble(x = 1:2)
filter(df, if_any(x, identity))
#> Error in `filter()`:
#> ℹ In argument: `if_any(x, identity)`.
#> Caused by error in `if_any()`:
#> ! `x` must be a logical vector, not an integer vector.
filter(df, (if_any(x, identity)))
#> Error in `filter()`:
#> ℹ In argument: `(if_any(x, identity))`.
#> Caused by error in `if_any()`:
#> ! `x` must be a logical vector, not an integer vector.
mutate(df, a = if_any(x, identity))
#> Error in `mutate()`:
#> ℹ In argument: `a = if_any(x, identity)`.
#> Caused by error in `if_any()`:
#> ! `x` must be a logical vector, not an integer vector.
# In general, with non-logical types resulting from applying `.fns` we now error
# more appropriately

# Before
df <- tibble(x = c(TRUE, FALSE), y = c("a", "b"))
filter(df, if_any(c(x, y), identity))
#> Error in `filter()`:
#> ℹ In argument: `|...`.
#> Caused by error in `<function(x) x>(x) | <function(x) x>(y)`:
#> ! operations are possible only for numeric, logical or complex types
filter(df, (if_any(c(x, y), identity)))
#> Error in `filter()`:
#> ℹ In argument: `(if_any(c(x, y), identity))`.
#> Caused by error in `op()`:
#> ! operations are possible only for numeric, logical or complex types

# After
df <- tibble(x = c(TRUE, FALSE), y = c("a", "b"))
filter(df, if_any(c(x, y), identity))
#> Error in `filter()`:
#> ℹ In argument: `if_any(c(x, y), identity)`.
#> Caused by error in `if_any()`:
#> ! `y` must be a logical vector, not a character vector.
filter(df, (if_any(c(x, y), identity)))
#> Error in `filter()`:
#> ℹ In argument: `(if_any(c(x, y), identity))`.
#> Caused by error in `if_any()`:
#> ! `y` must be a logical vector, not a character vector.

@DavisVaughan DavisVaughan changed the title Fix evaluation paths of if_any() and if_all() with zero or one inputs Make if_any() and if_all() consistent in all contexts Oct 29, 2025
CLAUDE.md Outdated
@@ -0,0 +1,76 @@
# CLAUDE.md
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plucked from ellmer, with modifications. I'm just trying it out.

R/across.R Outdated
dplyr_list_pany_pall(x, "any", ..., size = size, error_call = error_call)
}

dplyr_list_pany_pall <- function(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping to just remove this in favor of the vctrs versions soon, but I needed to get the semantics of it right enough to be able to add all the tests here

R/across.R Outdated
Comment on lines 430 to 432
init <- vec_rep(init, times = size)

reduce(x, op, .init = init)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an initial value is an important reason that the 0 and 1 input cases now "just work" correctly. This was NULL before.

Comment on lines 713 to 674
expr <- expr({
ns <- asNamespace("dplyr")

combine <- function(x, y) {
if (is_null(x)) {
y
} else {
call(op, x, y)
}
}
expr <- reduce(quos, combine, .init = NULL)
x <- list(!!!quos)

# In the evaluation path, `across()` automatically recycles to common size,
# so we must here as well for compatibility. `across()` also returns a 0
# col, 1 row data frame in the case of no inputs so that it will recycle to
# the group size, which we also do here.
size <- ns[["dplyr_list_size_common"]](x, absent = 1L, call = call(!!if_fn))
x <- ns[["dplyr_list_recycle_common"]](x, size = size, call = call(!!if_fn))

ns[[!!dplyr_fn]](x, size = size, error_call = call(!!if_fn))
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same kind of trick we use in as_pick_expansion() for the expansion path. I've carefully tried to match it to the evaluation path.

Basically we replace if_any(c(x, y), fn) with something like

x <- list(x = x, y = y)
ns <- asNamespace("dplyr")
size <- ns[["dplyr_list_size_common"]](x, absent = 1L, call = call(if_any()))
x <- ns[["dplyr_list_recycle_common"]](x, size = size, call = call(if_any()))
ns[["dplyr_list_pany"]](x, size = size, error_call = call(if_any()))

)
})

test_that("`across()` recycle `.fns` results to common size", {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the way I nearly changed across() to recycle inputs to the group size rather than recycling them to their common size. I think that would have been a mistake so I've added a test to prevent us from ever thinking of doing this.

})
})

test_that("`if_any()` and `if_all()` have consistent behavior across `filter()` and `mutate()`", {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mega test to make sure things are consistent everywhere.

It's obviously a lot of tests, but I think we really do need them all to be sure we aren't missing an edge case. These are all very hard to reason about since there are so many dimensions that intersect (filter vs mutate, expansion vs evaluation, groups vs no groups, etc)

Let's put it this way, I feel way more confident about this now that we have this test that hits every edge case

@DavisVaughan DavisVaughan marked this pull request as ready for review October 29, 2025 20:14
@DavisVaughan DavisVaughan requested a review from lionel- October 29, 2025 20:32
Comment on lines -672 to -678
combine <- function(x, y) {
if (is_null(x)) {
y
} else {
call(op, x, y)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This previous expansion was written with dbplyr and other backends in mind. We thought the dplyr expansions could be used as is in these other settings, removing some work for them as they would already know how to deal with the bare expansion.

Unfortunately, as the linked issues reveal, these bare expansions don't work that well for the dplyr backend because they are missing things like input validation. If we add these in the expansion, then the original purpose of generic translation is defeated.

We've never actually pushed towards using these expansions in other packages, so although this feels like a step backward, we don't currently lose anything by making the expansion untranslatable.

@DavisVaughan Maybe add a comment about why we still need the expansion at all (to avoid tidyselect getting evaluated on every group).

More tweaks

Accept exactly what Claude Code gave us

Rework Claude's attempt

Make evaluation and expansion cases more consistent

And add a battery of tests to ensure we don't regress on this consistency

Remove a TODO and update a snapshot test!

Collect `quos` first in case the user has a column named `ns`

Update snapshot test

Switch to a non snapshot based test

Use `vec_pany()` and `vec_pall()`

Remove claude files

Add comment about what expansion is for

Move vctrs wrappers
@DavisVaughan DavisVaughan merged commit 0f402ab into main Nov 19, 2025
14 checks passed
@DavisVaughan DavisVaughan deleted the fix/if-across branch November 19, 2025 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

if_any inside mutate, unexpected return on single columns if_any() does not work as expected inside mutate when no inputs are provided

3 participants