[SPARK-20899][PySpark] PySpark supports stringIndexerOrderType in RFormula #18122

actuaryzhang · 2017-05-26T16:45:17Z

What changes were proposed in this pull request?

PySpark supports stringIndexerOrderType in RFormula as in #17967.

How was this patch tested?

docstring test

actuaryzhang · 2017-05-26T16:45:57Z

@felixcheung @yanboliang @viirya

SparkQA · 2017-05-26T17:03:51Z

Test build #77428 has finished for PR 18122 at commit 4bca4d9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-05-26T20:11:13Z

python/pyspark/ml/feature.py

                            typeConverter=TypeConverters.toBoolean)

+    stringIndexerOrderType = Param(Params._dummy(), "stringIndexerOrderType",
+                                   "How to order categories of a string FEATURE column used by " +


FEATURE capitalize is common here?

Changed it to lower case now.

SparkQA · 2017-05-26T22:14:27Z

Test build #77440 has finished for PR 18122 at commit c3f4430.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang

One minor comment, otherwise LGTM. Thanks!

yanboliang · 2017-05-28T16:04:53Z

python/pyspark/ml/feature.py

+    |0.0|2.0|  b|[2.0,1.0]|  0.0|
+    |0.0|0.0|  a|(2,[],[])|  0.0|
+    +---+---+---+---------+-----+
+    ...


Could you move the newly added test to tests.py? We keep the basic doc tests here both for test and example, other tests should be placed at tests.py. Thanks.

SparkQA · 2017-05-29T17:53:51Z

Test build #77506 has finished for PR 18122 at commit 3510e24.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-05-29T20:54:22Z

Test build #77508 has finished for PR 18122 at commit 320203e.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class SparkMLTests(ReusedPySparkTestCase):

SparkQA · 2017-05-30T01:38:19Z

Test build #77509 has finished for PR 18122 at commit 4af4b35.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

actuaryzhang · 2017-05-30T01:58:50Z

@yanboliang I have moved the tests to the test file. Please let me know if there is anything else needed. Thanks.

viirya · 2017-05-30T02:16:32Z

LGTM

yanboliang

One very minor comment, thanks!

yanboliang · 2017-05-30T16:17:52Z

python/pyspark/ml/tests.py

+        observed = transformedDF.select("features").collect()
+        expected = [[1.0, 0.0], [2.0, 1.0], [0.0, 0.0]]
+        for i in range(0, len(expected)):
+            self.assertTrue((observed[i]["features"].toArray() == expected[i]).all())


Minor: Usually we're more prefer to use self.assertTrue(all(observed[i]["features"].toArray() == expected[i])).

SparkQA · 2017-05-30T16:59:34Z

Test build #77537 has finished for PR 18122 at commit 2e854a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2017-05-30T17:02:45Z

Merged into master, thanks for all.

Pyhton port for Rformula stringIndexerOrderType

4bca4d9

felixcheung reviewed May 26, 2017

View reviewed changes

fix doc issue

c3f4430

yanboliang reviewed May 28, 2017

View reviewed changes

move test to test file

3510e24

update test

320203e

fix test issues

4af4b35

yanboliang reviewed May 30, 2017

View reviewed changes

improve tests

2e854a8

asfgit closed this in ff5676b May 30, 2017

actuaryzhang deleted the PythonRFormula branch May 30, 2017 17:12

[SPARK-20899][PySpark] PySpark supports stringIndexerOrderType in RFormula #18122

[SPARK-20899][PySpark] PySpark supports stringIndexerOrderType in RFormula #18122

Uh oh!

Conversation

actuaryzhang commented May 26, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

actuaryzhang commented May 26, 2017

Uh oh!

SparkQA commented May 26, 2017

Uh oh!

felixcheung May 26, 2017

Choose a reason for hiding this comment

Uh oh!

actuaryzhang May 26, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 26, 2017

Uh oh!

yanboliang left a comment

Choose a reason for hiding this comment

Uh oh!

yanboliang May 28, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 29, 2017

Uh oh!

SparkQA commented May 29, 2017

Uh oh!

SparkQA commented May 30, 2017

Uh oh!

actuaryzhang commented May 30, 2017

Uh oh!

viirya commented May 30, 2017

Uh oh!

yanboliang left a comment

Choose a reason for hiding this comment

Uh oh!

yanboliang May 30, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 30, 2017

Uh oh!

yanboliang commented May 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants