Skip to content

Conversation

@actuaryzhang
Copy link
Contributor

What changes were proposed in this pull request?

PySpark supports stringIndexerOrderType in RFormula as in #17967.

How was this patch tested?

docstring test

@actuaryzhang
Copy link
Contributor Author

@SparkQA
Copy link

SparkQA commented May 26, 2017

Test build #77428 has finished for PR 18122 at commit 4bca4d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

typeConverter=TypeConverters.toBoolean)

stringIndexerOrderType = Param(Params._dummy(), "stringIndexerOrderType",
"How to order categories of a string FEATURE column used by " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FEATURE capitalize is common here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to lower case now.

@SparkQA
Copy link

SparkQA commented May 26, 2017

Test build #77440 has finished for PR 18122 at commit c3f4430.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, otherwise LGTM. Thanks!

|0.0|2.0| b|[2.0,1.0]| 0.0|
|0.0|0.0| a|(2,[],[])| 0.0|
+---+---+---+---------+-----+
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the newly added test to tests.py? We keep the basic doc tests here both for test and example, other tests should be placed at tests.py. Thanks.

@SparkQA
Copy link

SparkQA commented May 29, 2017

Test build #77506 has finished for PR 18122 at commit 3510e24.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 29, 2017

Test build #77508 has finished for PR 18122 at commit 320203e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkMLTests(ReusedPySparkTestCase):

@SparkQA
Copy link

SparkQA commented May 30, 2017

Test build #77509 has finished for PR 18122 at commit 4af4b35.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@actuaryzhang
Copy link
Contributor Author

@yanboliang I have moved the tests to the test file. Please let me know if there is anything else needed. Thanks.

@viirya
Copy link
Member

viirya commented May 30, 2017

LGTM

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One very minor comment, thanks!

observed = transformedDF.select("features").collect()
expected = [[1.0, 0.0], [2.0, 1.0], [0.0, 0.0]]
for i in range(0, len(expected)):
self.assertTrue((observed[i]["features"].toArray() == expected[i]).all())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Usually we're more prefer to use self.assertTrue(all(observed[i]["features"].toArray() == expected[i])).

@SparkQA
Copy link

SparkQA commented May 30, 2017

Test build #77537 has finished for PR 18122 at commit 2e854a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

Merged into master, thanks for all.

@asfgit asfgit closed this in ff5676b May 30, 2017
@actuaryzhang actuaryzhang deleted the PythonRFormula branch May 30, 2017 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants