Skip to content

Conversation

@jiayue-zhang
Copy link
Contributor

@jiayue-zhang jiayue-zhang commented Dec 9, 2016

What changes were proposed in this pull request?

Allow DataFrame.replace() to replace with None/null values.

How was this patch tested?

Python doctest and unit test. Scala unit test.

@HyukjinKwon
Copy link
Member

Would we need a test for Scala too? I checked this by myself and it seems working fine with Scala though. I could argue that this affects the language-specific functions of both Python and Scala as a not strong opinion.

case _ => replacement.map { case (k, v) => (convertToDouble(k), convertToDouble(v)) }
}
val replacementMap: Map[_, _] =
if (replacement.head._2 == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just write case null => here and thereby avoid the if else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that but failed. Scala doesn't allow pattern matching with null, scala.Nothing and scala.Null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait case null => works. I tried case v null which doesn't. Let me modify this. Thanks!

@SparkQA
Copy link

SparkQA commented Dec 10, 2016

Test build #3490 has finished for PR 16225 at commit 0b15c8f.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

@bravo-zhang Could you resolve the conflicts? I will review it then. Thanks!

@SparkQA
Copy link

SparkQA commented May 23, 2017

Test build #77261 has finished for PR 16225 at commit 2c532c3.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@jiayue-zhang
Copy link
Contributor Author

Thanks for taking a look, @gatorsmile The conflicts have been resolved.
I appreciate if @zero323 can take a look as well since you made improvement on this function recently.

@SparkQA
Copy link

SparkQA commented May 24, 2017

Test build #77273 has finished for PR 16225 at commit 43fb6bd.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 24, 2017

Test build #77282 has finished for PR 16225 at commit b5424d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, sorry its taken so long to review. I did a first read through the Python side and I've got two minor questions. Hopefully @zero323, @davie , or @gatorsmile can also have some time once the current release is finished to take a look.

"Got {0}".format(type(to_replace)))

if not isinstance(value, valid_types) and not isinstance(to_replace, dict):
if not isinstance(value, valid_types) and value is not None \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So slightly jet lagged style question: would this be clearer if we just add type(None) to L1398? I know PEP8 says we should only use is is not for checking if something is none rather than depending on the implicit conversion to boolean -- but since really checking the type here we aren't really in danger of that. (This is just a suggestion to make it easier to read - if others think its easier to read this way thats fine :)). @davies ?

to_replace = [to_replace]

if isinstance(value, (float, int, long, basestring)):
if isinstance(value, (float, int, long, basestring)) or value is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

if not any(all_of_type(rep_dict.keys()) and all_of_type(rep_dict.values())
if not any(all_of_type(rep_dict.keys())
and (all_of_type(rep_dict.values())
or list(rep_dict.values()).count(None) == len(rep_dict))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Scala code null is allowed in to be the replacement value for some of the elements but not in the Python code. Is this intentional? If so we should document it clearly and expand on the error message bellow (otherwise we should make it more flexible).

@gatorsmile
Copy link
Member

We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks!

@asfgit asfgit closed this in b32bd00 Jun 27, 2017
@jiayue-zhang
Copy link
Contributor Author

@holdenk Thanks for review. I'll combine type(None) in the isinstance. I also made Scala and Python to accept null more generally and in the same way.
PR is reopened at: #18820
@gatorsmile please also take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants