-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12044] [SparkR] Fix usage of isnan, isNaN #10037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
6805952
Fix usage of isnan, isnull, isNaN, isNull, isNotNull
yanboliang 00ce43a
add more test cases
yanboliang 041c9c6
Add alias is.nan
yanboliang 10a5fe7
Not to expose isnull
yanboliang 95fdd2c
remove setMethod for isnull
yanboliang 3ee7d5c
fix test case
yanboliang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
isNaNis being removed, we shouldn't set this generic right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, isNaN is around since Spark 1.5 - if we are taking this out we would need to note this breaking change in release doc with a JIRA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, rereading the description. it sounds like we have
isNaNfor Columnisnanfor DataFrame?
That seems a bit confusing..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. I have send #10056 to push SQL side make change to uniform interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't uniform the
isNaNandisnaninterface at SQL side, please refer the comments at #10056. Further more, there are different semantics between Scala and R aboutisnull. So let's narrow this PR to provide the same functions as Scala at SparkR side, this is my original motivation. I will add test cases for column as @sun-rui suggested. Then I think this PR can be merged firstly and we start to discuss corresponding alias or explanations about theNULL/NAdifference at SPARK-12071. @sun-rui @felixcheung @shivaramThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the resolution here?
isNullis still exported after this PR ? How is @felixcheung 's example going to change after this PR ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isNullandisNotNullare still exported after this PR because they have been exist at Spark 1.5.isNullandisNotNull(with upper case) are functions ofColumn, I consider they are Spark specific functions so did not remove them. If we want to remove them, I can do it in a follow-up PR and they are breaking changes need to be explain in release note. @shivaramThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I think I get it. Let me summarize the situation below and let know if I am getting it right.
Columnas defined incolumn.R. These mirror the scala functions.isnanandis.nanforColumnin this PR. These callisnanin Scala. And I presume their behavior is this the same asisNaN?isNaN? I can't find that call in our unit test file, so I guess it doesn't exist in SparkR ? Does this exist in Scala ?NAin R tonullin the SparkSQL side.I think the change looks fine to me, but I just want to understand the different things going on here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's correct.We could scope this PR to only isnan on Column. And track others with JIRAs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shivaram, to be clear, IsNaN, isnan and is.nan are functions (in org.apache.spark.sql.functions) applied to Column, they are not DataFrame operators. IsNaN function is deprecated by isnan function. We add is.nan as an alias of isnan.