-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-14932][SQL] Allow DataFrame.replace() to replace values with None #16225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Would we need a test for Scala too? I checked this by myself and it seems working fine with Scala though. I could argue that this affects the language-specific functions of both Python and Scala as a not strong opinion. |
| case _ => replacement.map { case (k, v) => (convertToDouble(k), convertToDouble(v)) } | ||
| } | ||
| val replacementMap: Map[_, _] = | ||
| if (replacement.head._2 == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just write case null => here and thereby avoid the if else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that but failed. Scala doesn't allow pattern matching with null, scala.Nothing and scala.Null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait case null => works. I tried case v null which doesn't. Let me modify this. Thanks!
|
Test build #3490 has finished for PR 16225 at commit
|
|
ok to test |
|
@bravo-zhang Could you resolve the conflicts? I will review it then. Thanks! |
|
Test build #77261 has finished for PR 16225 at commit
|
|
Thanks for taking a look, @gatorsmile The conflicts have been resolved. |
|
Test build #77273 has finished for PR 16225 at commit
|
|
Test build #77282 has finished for PR 16225 at commit
|
holdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, sorry its taken so long to review. I did a first read through the Python side and I've got two minor questions. Hopefully @zero323, @davie , or @gatorsmile can also have some time once the current release is finished to take a look.
| "Got {0}".format(type(to_replace))) | ||
|
|
||
| if not isinstance(value, valid_types) and not isinstance(to_replace, dict): | ||
| if not isinstance(value, valid_types) and value is not None \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So slightly jet lagged style question: would this be clearer if we just add type(None) to L1398? I know PEP8 says we should only use is is not for checking if something is none rather than depending on the implicit conversion to boolean -- but since really checking the type here we aren't really in danger of that. (This is just a suggestion to make it easier to read - if others think its easier to read this way thats fine :)). @davies ?
| to_replace = [to_replace] | ||
|
|
||
| if isinstance(value, (float, int, long, basestring)): | ||
| if isinstance(value, (float, int, long, basestring)) or value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
| if not any(all_of_type(rep_dict.keys()) and all_of_type(rep_dict.values()) | ||
| if not any(all_of_type(rep_dict.keys()) | ||
| and (all_of_type(rep_dict.values()) | ||
| or list(rep_dict.values()).count(None) == len(rep_dict)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Scala code null is allowed in to be the replacement value for some of the elements but not in the Python code. Is this intentional? If so we should document it clearly and expand on the error message bellow (otherwise we should make it more flexible).
|
We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks! |
|
@holdenk Thanks for review. I'll combine type(None) in the |
What changes were proposed in this pull request?
Allow DataFrame.replace() to replace with None/null values.
How was this patch tested?
Python doctest and unit test. Scala unit test.