[SPARK-11923][ML] Python API for ml.feature.ChiSqSelector #10186

yinxusen · 2015-12-08T03:51:54Z

https://issues.apache.org/jira/browse/SPARK-11923

SparkQA · 2015-12-08T04:11:04Z

Test build #47306 has finished for PR 10186 at commit a5e72ad.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, HasLabelCol):\n * class ChiSqSelectorModel(JavaModel):\n

SparkQA · 2015-12-08T04:43:55Z

Test build #47307 has finished for PR 10186 at commit f49e231.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, HasLabelCol):\n * class ChiSqSelectorModel(JavaModel):\n

yanboliang · 2015-12-09T10:33:00Z

python/pyspark/ml/feature.py

yanboliang · 2015-12-09T10:36:41Z

Looks good except minor issues.

holdenk · 2015-12-10T00:21:00Z

python/pyspark/ml/feature.py

So I think we probably don't need the "#"s in the pydoc

yinxusen · 2015-12-10T00:34:30Z

Thanks for comments @holdenk and @yanboliang. It's so strange that I cannot see comments from @yanboliang in this page. It must be a Github issue.

I don't know whether we can catch up for 1.6? If not, I'll change the tag into 1.7 later.

SparkQA · 2015-12-10T01:00:53Z

Test build #47462 has finished for PR 10186 at commit 657a0d4.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, HasLabelCol):\n * class ChiSqSelectorModel(JavaModel):\n

holdenk · 2015-12-10T19:35:22Z

python/pyspark/ml/feature.py

This model is loadable and saveable in Java, I don't see us doing this elsewhere in ml/ yet (although we do it in mllib/) but do we maybe want to use the JavaLoader & JavaSaveable base classes?

Model persistence is important in PySpark, but there is no need to add it in this PR. @yanboliang has a JIRA for adding pipeline persistence in PySpark: https://issues.apache.org/jira/browse/SPARK-11939

Could you please add the selectedFeatures method

yanboliang · 2015-12-13T08:34:11Z

LGTM

thunterdb · 2016-01-07T23:33:44Z

LGTM cc @jkbradley

yinxusen · 2016-01-11T11:19:32Z

Change the version to 2.2.0

SparkQA · 2016-01-11T11:44:57Z

Test build #49141 has finished for PR 10186 at commit aa9d40f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-12T01:51:57Z

python/pyspark/ml/feature.py

nit: indent 1 more space (this line + next line)

jkbradley · 2016-01-12T01:52:12Z

Thanks for the PR! I only had a couple more comments.

yinxusen · 2016-01-12T17:14:00Z

test it please

SparkQA · 2016-01-12T18:10:09Z

Test build #49243 has finished for PR 10186 at commit 0bd1271.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yinxusen · 2016-01-13T03:10:19Z

python/pyspark/ml/feature.py

Ping @jkbradley

I use a javaSelectedFeatures because I find that if I use self._call_java("selectedFeatures"), it returns a array('i', [3]), which is strange since the result should be [3]. I doubt that there is something wrong in SerDe.dumps(javaObject) in Scala side then deserialize it in Python side with Scala Array.

yinxusen · 2016-01-13T03:12:30Z

@jkbradley I also find an inconsistency returning value so I leave a JIRA here: https://issues.apache.org/jira/browse/SPARK-12780

yinxusen · 2016-01-13T03:16:27Z

And what's more, the CountVectorizerModel.vocabulary in feature.py should return an array according to its Scala part, which is also a Scala Array. However, it returns a Python Tuple.

jkbradley · 2016-01-14T01:29:31Z

mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala

This isn't needed for Java, so I'd make it a private API. But hopefully we can remove this altogether once [https://github.com//pull/10724] gets merged.

jkbradley · 2016-01-14T01:32:27Z

@yinxusen Thanks! Let's get your other PR in first, and then update this PR.

yinxusen · 2016-01-14T03:21:43Z

@jkbradley Yes, sure.

yinxusen · 2016-01-15T09:04:24Z

python/pyspark/ml/feature.py

@jkbradley I learn from @holdenk' PR #10085, we can transform the JavaArray directly with list in Python. So there is no need to call _call_java().

SparkQA · 2016-01-15T09:28:03Z

Test build #49453 has finished for PR 10186 at commit 223fdf4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-16T00:46:39Z

See comment on [https://github.com//pull/10724]. We'll return to this PR after [https://github.com//pull/10772] gets merged.

yinxusen · 2016-01-26T17:04:55Z

test it please

SparkQA · 2016-01-26T17:35:07Z

Test build #50112 has finished for PR 10186 at commit 3fca95e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yinxusen · 2016-01-26T17:37:15Z

@jkbradley Another PR related to #10772.

jkbradley · 2016-01-26T19:56:08Z

LGTM
Merging with master
Thanks for this & the other fix!

yinxusen added 6 commits November 26, 2015 22:24

add QuantileDiscretizer in Python

a11558e

add ChiSqSelector in Python

670821b

add class exports

05f3edd

Merge branch 'master' into SPARK-11987

3789867

add java competible

3a33327

remove QuantileDiscretizer

a5e72ad

change tags from 1.7 to 1.6

f49e231

yanboliang reviewed Dec 9, 2015
View reviewed changes

python/pyspark/ml/feature.py Outdated

Copy link

Contributor

yanboliang Dec 9, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove #

holdenk reviewed Dec 10, 2015
View reviewed changes

python/pyspark/ml/feature.py Outdated

Copy link

Contributor

holdenk Dec 10, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we probably don't need the "#"s in the pydoc

fix # and \

657a0d4

holdenk reviewed Dec 10, 2015
View reviewed changes

yinxusen added 2 commits January 11, 2016 19:16

Merge branch 'master' into SPARK-11923

61f3827

merge with master

aa9d40f

jkbradley reviewed Jan 12, 2016
View reviewed changes

python/pyspark/ml/feature.py Outdated

Copy link

Member

jkbradley Jan 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent 1 more space (this line + next line)

yinxusen added 2 commits January 12, 2016 13:06

fix nits

e276440

add selectedFeatures

0bd1271

yinxusen reviewed Jan 13, 2016
View reviewed changes

jkbradley reviewed Jan 14, 2016
View reviewed changes

yinxusen added 2 commits January 15, 2016 17:00

change the calling

32cdbb0

remove import

223fdf4

yinxusen reviewed Jan 15, 2016
View reviewed changes

yinxusen mentioned this pull request Jan 15, 2016

[SPARK-12780] Inconsistency returning value of ML python models' properties #10724

Closed

yinxusen added 2 commits January 26, 2016 09:01

merge with master

83e7a90

change to call_java

3fca95e

asfgit closed this in 8beab68 Jan 26, 2016

[SPARK-11923][ML] Python API for ml.feature.ChiSqSelector #10186

[SPARK-11923][ML] Python API for ml.feature.ChiSqSelector #10186

Uh oh!

Conversation

yinxusen commented Dec 8, 2015

Uh oh!

SparkQA commented Dec 8, 2015

Uh oh!

SparkQA commented Dec 8, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanboliang commented Dec 9, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinxusen commented Dec 10, 2015

Uh oh!

SparkQA commented Dec 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanboliang commented Dec 13, 2015

Uh oh!

thunterdb commented Jan 7, 2016

Uh oh!

yinxusen commented Jan 11, 2016

Uh oh!

SparkQA commented Jan 11, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Jan 12, 2016

Uh oh!

yinxusen commented Jan 12, 2016

Uh oh!

SparkQA commented Jan 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinxusen commented Jan 13, 2016

Uh oh!

yinxusen commented Jan 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Jan 14, 2016

Uh oh!

yinxusen commented Jan 14, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 15, 2016

Uh oh!

jkbradley commented Jan 16, 2016

Uh oh!

yinxusen commented Jan 26, 2016

Uh oh!

SparkQA commented Jan 26, 2016

Uh oh!

yinxusen commented Jan 26, 2016

Uh oh!

jkbradley commented Jan 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants