-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-20899][PySpark] PySpark supports stringIndexerOrderType in RFormula #18122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #77428 has finished for PR 18122 at commit
|
python/pyspark/ml/feature.py
Outdated
| typeConverter=TypeConverters.toBoolean) | ||
|
|
||
| stringIndexerOrderType = Param(Params._dummy(), "stringIndexerOrderType", | ||
| "How to order categories of a string FEATURE column used by " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FEATURE capitalize is common here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to lower case now.
|
Test build #77440 has finished for PR 18122 at commit
|
yanboliang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment, otherwise LGTM. Thanks!
python/pyspark/ml/feature.py
Outdated
| |0.0|2.0| b|[2.0,1.0]| 0.0| | ||
| |0.0|0.0| a|(2,[],[])| 0.0| | ||
| +---+---+---+---------+-----+ | ||
| ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move the newly added test to tests.py? We keep the basic doc tests here both for test and example, other tests should be placed at tests.py. Thanks.
|
Test build #77506 has finished for PR 18122 at commit
|
|
Test build #77508 has finished for PR 18122 at commit
|
|
Test build #77509 has finished for PR 18122 at commit
|
|
@yanboliang I have moved the tests to the test file. Please let me know if there is anything else needed. Thanks. |
|
LGTM |
yanboliang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very minor comment, thanks!
python/pyspark/ml/tests.py
Outdated
| observed = transformedDF.select("features").collect() | ||
| expected = [[1.0, 0.0], [2.0, 1.0], [0.0, 0.0]] | ||
| for i in range(0, len(expected)): | ||
| self.assertTrue((observed[i]["features"].toArray() == expected[i]).all()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Usually we're more prefer to use self.assertTrue(all(observed[i]["features"].toArray() == expected[i])).
|
Test build #77537 has finished for PR 18122 at commit
|
|
Merged into master, thanks for all. |
What changes were proposed in this pull request?
PySpark supports stringIndexerOrderType in RFormula as in #17967.
How was this patch tested?
docstring test