-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Make if_any() and if_all() consistent in all contexts
#7747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if_any() and if_all() with zero or one inputsif_any() and if_all() consistent in all contexts
CLAUDE.md
Outdated
| @@ -0,0 +1,76 @@ | |||
| # CLAUDE.md | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plucked from ellmer, with modifications. I'm just trying it out.
R/across.R
Outdated
| dplyr_list_pany_pall(x, "any", ..., size = size, error_call = error_call) | ||
| } | ||
|
|
||
| dplyr_list_pany_pall <- function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping to just remove this in favor of the vctrs versions soon, but I needed to get the semantics of it right enough to be able to add all the tests here
R/across.R
Outdated
| init <- vec_rep(init, times = size) | ||
|
|
||
| reduce(x, op, .init = init) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an initial value is an important reason that the 0 and 1 input cases now "just work" correctly. This was NULL before.
| expr <- expr({ | ||
| ns <- asNamespace("dplyr") | ||
|
|
||
| combine <- function(x, y) { | ||
| if (is_null(x)) { | ||
| y | ||
| } else { | ||
| call(op, x, y) | ||
| } | ||
| } | ||
| expr <- reduce(quos, combine, .init = NULL) | ||
| x <- list(!!!quos) | ||
|
|
||
| # In the evaluation path, `across()` automatically recycles to common size, | ||
| # so we must here as well for compatibility. `across()` also returns a 0 | ||
| # col, 1 row data frame in the case of no inputs so that it will recycle to | ||
| # the group size, which we also do here. | ||
| size <- ns[["dplyr_list_size_common"]](x, absent = 1L, call = call(!!if_fn)) | ||
| x <- ns[["dplyr_list_recycle_common"]](x, size = size, call = call(!!if_fn)) | ||
|
|
||
| ns[[!!dplyr_fn]](x, size = size, error_call = call(!!if_fn)) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same kind of trick we use in as_pick_expansion() for the expansion path. I've carefully tried to match it to the evaluation path.
Basically we replace if_any(c(x, y), fn) with something like
x <- list(x = x, y = y)
ns <- asNamespace("dplyr")
size <- ns[["dplyr_list_size_common"]](x, absent = 1L, call = call(if_any()))
x <- ns[["dplyr_list_recycle_common"]](x, size = size, call = call(if_any()))
ns[["dplyr_list_pany"]](x, size = size, error_call = call(if_any()))| ) | ||
| }) | ||
|
|
||
| test_that("`across()` recycle `.fns` results to common size", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along the way I nearly changed across() to recycle inputs to the group size rather than recycling them to their common size. I think that would have been a mistake so I've added a test to prevent us from ever thinking of doing this.
| }) | ||
| }) | ||
|
|
||
| test_that("`if_any()` and `if_all()` have consistent behavior across `filter()` and `mutate()`", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mega test to make sure things are consistent everywhere.
It's obviously a lot of tests, but I think we really do need them all to be sure we aren't missing an edge case. These are all very hard to reason about since there are so many dimensions that intersect (filter vs mutate, expansion vs evaluation, groups vs no groups, etc)
Let's put it this way, I feel way more confident about this now that we have this test that hits every edge case
5ad9c62 to
21023cb
Compare
| combine <- function(x, y) { | ||
| if (is_null(x)) { | ||
| y | ||
| } else { | ||
| call(op, x, y) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This previous expansion was written with dbplyr and other backends in mind. We thought the dplyr expansions could be used as is in these other settings, removing some work for them as they would already know how to deal with the bare expansion.
Unfortunately, as the linked issues reveal, these bare expansions don't work that well for the dplyr backend because they are missing things like input validation. If we add these in the expansion, then the original purpose of generic translation is defeated.
We've never actually pushed towards using these expansions in other packages, so although this feels like a step backward, we don't currently lose anything by making the expansion untranslatable.
@DavisVaughan Maybe add a comment about why we still need the expansion at all (to avoid tidyselect getting evaluated on every group).
b7b4645 to
be4c8fe
Compare
More tweaks Accept exactly what Claude Code gave us Rework Claude's attempt Make evaluation and expansion cases more consistent And add a battery of tests to ensure we don't regress on this consistency Remove a TODO and update a snapshot test! Collect `quos` first in case the user has a column named `ns` Update snapshot test Switch to a non snapshot based test Use `vec_pany()` and `vec_pall()` Remove claude files Add comment about what expansion is for Move vctrs wrappers
be4c8fe to
d8b66bd
Compare
Closes #7746
Closes #7077
It turns out that our application of
if_any()andif_all()was fairly inconsistent. This is due to the fact that they are tricky to get right, we have 2 different implementations of them. One for the expansion case and one for the evaluation case. I've now tried to unify these to use the same underlying implementation,dplyr_list_pany()ordplyr_list_pall()depending on the scenario (which just usevec_pany()andvec_pall()under the hood)In addition to greater consistency across the board, you'll also note that in errors the
In argument:label also now reports the original expression pre-expansion in thefilter()cases, which is a much better errorIn the examples below, for
filter(), note that adding()around theif_any()orif_all()calls triggers the evaluation case rather than the expansion case.